swerik-project / the-swedish-parliament-corpus

A repository for managing public, versioned releases of the Swedish Parliament Corpus.
4 stars 0 forks source link

Storing references to data points beyond the Biography Books #13

Closed BobBorges closed 1 month ago

BobBorges commented 2 months ago

proposal

MansMeg commented 2 months ago

I think this looks good. But I would love to get opinions from @ljo and @ninpnin

BobBorges commented 2 months ago

I also wanted to comment on Joakim's suggestion that the metadata could better be stored in a markup format. In general, he's right, and maybe it's something to think about for v2. It would definitely be a breaking change though, both in terms of the format of the metadata, but also in terms of a lot of the code we use.

MansMeg commented 1 month ago

Ping @ljo and @ninpnin .

ninpnin commented 1 month ago

Would these be edge cases, or would we store data in these files en masse?

If we have basically all data of some sort here, it would increase redundancy.

BobBorges commented 1 month ago

The idea is more about fringe cases. So Bio-books are the standard source, but if, e.g., we know some piece of info is more accurate from another source, we store it in this way. But I'm not proposing that we add a bio-book reference to every datum in the corpus.

ninpnin commented 1 month ago

Apart from the folder structure – handled in the other decision – this seems fine to me

MansMeg commented 1 month ago

Great!

@fredrik1984 maybe you should formally merge this as the PI if you agree with it?

fredrik1984 commented 1 month ago

done!