Closed MansMeg closed 3 years ago
Directly changing the XML files would inevitably make the curations sequential / build on top of previous versions, wouldn't it?
Yes. But how do we solve this in the best way? Say that I want to fix some errors. The easiest for me is to correct it in the PR and then do a PR? Because I also want to use the same correct XML file?
Can we extract the corrections from a diff?
Can we extract the corrections from a diff?
Yes, but there would need to be some restrictions to that, for example "no edits can add or remove paragraphs". That might be trivial or difficult, I'm not 100% sure.
Sure. I think we should focus on making it easy for people to curate and add corrections.
Another way to allow orthogonality would be splitting the protocols into multiple files, and then building solely on top of those files.
Yes. But can that be done in a reasonable way? I guess a protocol is the smallest unit?
The division into text areas is very consistent, that could be used. But then it starts to be a looot of files, ~3-4x of the total page count.
It created a test repository for this approach. 1.5M files. Git status, add and commit took maybe 3-5 seconds each.
Hmmm. Yes, that seem to be too much. I guess protocoll is the smallest unit. Any other ideas how to get a good pipeline? The goal is twofold:
Maybe the simplest is just to go even further. Just have code to generate the final files and the parlaclarin files. Then we can trace back using just git? No additional files at all?
Implemented protocol-by-protocol.
So if I understand this correctly we just give information on what we want to curate. But I guess some people might want to just change the XML files? Is that a correct way to do it?