Open baberabb opened 9 months ago
Looks like the Canadian hansard is about 5GB in .csv format and wouldn't require too much cleanup. If the UK hansard is of a similar size and would require tooling to scrape, it might not be worth it. Can we confirm that this content is indeed public domain?
The UK hansard is under the Open Parliament Licence which tracks pretty closely with Public domain except has personal information and national security exemptions.
For the Canadian one, there doesn't seem to be any authoritative source on the dataset site or the government's. The dataset seems to be mostly sourced from government publications according to the website, so should probably come under this. I sent them an email.
The UK hansard is also available as a consolidated dataset and also requires minimal formatting. Just trying to choose between:
<member for parliament>
<what they say>
<next speaker>
or
<member for parliament>: <what they say>
<next speaker>
I would love to help with this.
(Looks like several countries do not clearly specify the licensing information of the Hansards)
I would love to help with this.
(Looks like several countries do not clearly specify the licensing information of the Hansards)
Hey! I'm mostly done with the Canadian and UK ones, and yeah haven't been able to get much license information for all others. The Australian one is CC-BY-ND-NC which is out of scope . The Singapore one is also under a limited license iirc.
I would love to help with this. (Looks like several countries do not clearly specify the licensing information of the Hansards)
Hey! I'm mostly done with the Canadian and UK ones, and yeah haven't been able to get much license information for all others. The Australian one is CC-BY-ND-NC which is out of scope . The Singapore one is also under a limited license iirc.
Yes, I noticed both Australia and Singapore were out of scope. Is there anything else you think I can help you with? (even if it is not this specific Hansard task)
A lot of Commonwealth counties provide official transcripts of parliamentary debates going back many years. The work on the Canadian one already seems to be done (couldn't find a license, can ask them), and the UK Hansard can be easily scraped.