stanfordnmbl / opencap-api

Apache License 2.0
5 stars 6 forks source link

Periodic database cleaning #144

Closed olehkorkh-planeks closed 9 months ago

olehkorkh-planeks commented 9 months ago

Fix for https://github.com/stanfordnmbl/opencap-api/issues/82

antoinefalisse commented 9 months ago

@suhlrich and @AlbertoCasasOrtiz, tagging you both on this one (I don't want us to start deleting data). Can you both review? We can then do some testing on the dev side.

antoinefalisse commented 8 months ago

@olehkorkh-planeks two remarks regarding this PR:

  1. This line selects 500 sessions, the problem is that if, after some cleaning maybe, the 500 first selected sessions are good then there will be no more cleaning even if there are still bad sessions in there. That's why I had to bump to 500 to clean the entire dev db. Is there a way to adjust the task to make sure we go through the entire db? The prod db has over 40k sessions. I can see that it is not practical to filter the entire db every time, maybe we can find a way to filter it once and then we only select part of it (as is now). Let me know your thoughts.

  2. We need to use a more conservative criterion here. We noticed that we have sessions with data we want to keep but with no neutral trial. FYI it is possible to directly navigate to step5 and record data even if there is no neutral. Let's use the following:

    • delete session if empty (no trials) AND older than 7 days
    • delete session if all the existing trials are named calibration AND all the trials have status error AND older than 7 days

FYI, we also noticed that we might not have a proper backup of the DB and S3 in place, so we might want to implement that before merging this into main. Regardless, it would be good to address the two points above (then maybe we comment for now). Thanks.

FYI @suhlrich and @AlbertoCasasOrtiz

olehkorkh-planeks commented 8 months ago

@antoinefalisse I prepared a fixed implementation. Let me know if it works for you - https://github.com/stanfordnmbl/opencap-api/pull/144