natsukagami / kjudge

A simple system for hosting competitive programming contests.
GNU Affero General Public License v3.0
23 stars 11 forks source link

Normalize file endings #102

Open minhnhatnoe opened 1 year ago

minhnhatnoe commented 1 year ago

knock knock

minhnhatnoe commented 1 year ago

We should write a SQL migration file that does this normalization on all existing files/submissions. On that note, can we do this within SQLite? I suppose we'd not want native line-endings, but to just normalize everything into \n, to make the database portable...

What do you mean by "within SQLite"? It looks like SQLite has no option for line endings normalization.

I'm currently normalizing BEFORE inserting them into the database. Maybe we can write a conversion script for the entire db and run it at startup? This can make startup slower than it should be, so a better solution would be saving the current OS in the db, then if it doesn't match we will issue some kind of warning or maybe panic.

natsukagami commented 1 year ago

I'm currently normalizing BEFORE inserting them into the database.

When you merge this PR it creates a problem where we have an invariant assuming that all text in the database has already been normalized, which is not true for text that already has been inside the database before the update. One of the ways we can handle this is to write an SQL migration file (something like the one in embed/assets/sql) that does this conversion for everything in the database. We can also special-case this migration, to run additional code on the Go side (probably altering some code in the db package).

That aside, my original vision for the kjudge.db file was that it should be a portable copy of the whole contest data, something akin to Themis' .contest file. SQLite already provides us with a cross-platform file format, we should strive to keep it that way. That might mean we need to normalize to just a certain ending (e.g. everything to \n in the DB), and add them back when we serve them. This way we maintain the invariant that the DB is fully consistent across OSes.

natsukagami commented 1 year ago

There's also the problem of mismatch between the end user's OS (the contestants') and the server. We cannot normalize the contestant's code using \r\n into the Linux server's format, and serve the \n format back to the contestant; it breaks certain editors (looking at you Notepad). We might want something more user-oriented here too.