mysociety / pombola

GNU Affero General Public License v3.0
65 stars 41 forks source link

[KE] Extract committee hansard entries for Jessica #2211

Closed JenMysoc closed 7 years ago

JenMysoc commented 7 years ago

I'm not sure if this is possible, but a kenyan researcher has asked Mzalendo to help him compile hansard extracts of committee presentations by the following committees:

Justice and legal affairs Budget and Appropriations Committee Public Accounts Committee Committee on education, research and technology Constitution Implementation Oversight Committee

For all entries from April 2013 to October 2016.

Is this something we would be able to do simply and quickly?

mhl commented 7 years ago

This is too unclear to us at the moment to be able to work on this, unfortunatetly. I think this is saying that there are times when these committees make a presentation to parliament and that the content of these presentations appears as speeches in Hansard, i.e. can be found somewhere under http://info.mzalendo.com/hansard/ as speeches - if that's right, I'm assuming this request is for whether we can detect when those presentations occur and extract those speeches, with the committee name, date and house. If that's correct, it's hard for us to even assess how difficult it would be to do this, since we don't have any examples of where such a presentation appears in the transcripts. As a starting point, we'd need to have about 5 examples of links to hansard showing examples of these presentations, and then we could consider whether there are distinctive and consistent features that would enable us to extract the data they want.