Closed gustavkrist closed 1 month ago
looks good to me. @tfnribeiro - can you also have a quick look?
I also think it's good!
I would just add a checkpoint so commit every 1000 articles or so - just so we don't have a really large transaction. One more thing, might be to not print every article, but just have a flag, as it will be a lot of prints. Maybe it can be a flag, so if we were only updating a few articles it would be nice to see how they are changing.
In my DB with the latest dump it takes about 20 minutes to run.
I was attempting to run it in my environment and the process seems to get killed about a 1/3 into the process.
Checkpointing in-between seems to allow the process to continue past that point:
It ran very fast on the dump I got, I did not consider the scale of the full production database. I'll add checkpoints if needed, but it sounds like you've already done so.
Alright, I will make a commit to add the checkpoint commits. I just checked that it completed while checkpointing commits, taking a total of 45 minutes.
I have pushed the changes, essentially I removed the with
(as it seems like it automatically closes once you commit), added two constants so they can be edited if we want to see the prints and the checkpoint step.
I've added a script to recompute FK scores for all the languages that just had new constants added, similar to https://github.com/zeeguu/api/blob/master/tools/old/recompute_fk_difficulties_for_polish.py.