welfare-state-analytics / riksdagen-corpus

Swedish parliamentary proceedings - Riksdagens protokoll 1867-today
Other
26 stars 5 forks source link

Setup ALTO and PDF files repos for the records #469

Closed MansMeg closed 6 months ago

MansMeg commented 9 months ago

Hi!

We have now gotten an ok from Chris at KB lab to put the ALTO-files from betalab as a separate repo. Its no rush, but probably good to fix long term to map back to the source in an open fashion.

BobBorges commented 9 months ago

Excellent! Now I'm curating those formerly missing protocols, and by default the pb element links will 404, so it will be better if we change all at once to link a page on gh.

MansMeg commented 9 months ago

Yes. Maybe make this part of 1.0? Would be nice at least.

MansMeg commented 9 months ago

We probably want to use gitLFS for the pdfs. So the question is how to point to specific pages on github LFS. I think @ljo had some ideas.

BobBorges commented 9 months ago

We could just chunk up the pdfs by page, unless github supports some GET-like parameter to link a specific page.

MansMeg commented 9 months ago

Yes. I cant find a simple way to point to a specific page on github with git lfs.

BobBorges commented 9 months ago

it's easy enough to pdfseparate pdfs, so the one image per page option seems like a reasonable one.

ninpnin commented 6 months ago

Repo here https://github.com/swerik-project/riksdagen-records-pdf let's follow the issues there