Closed koonchaya closed 3 hours ago
I dug up the logs and found that it's an error at the SQL level. Most statements failed to run when the server is likely at its max. See error images below...
It's an error from legacy, but since our backup system is designed to be fault-tolerant, they can still do an export even if all queries failed. But when the error comes, the next batch won't be queried. So if the first batch fail you will get nothing.
The problem is the query took too long to do so. I guess there are a couple places where we can improve it.
@grindarius ok how long do you anticipate it will take to implement these fixes so that we can check if they work in fixing the issue? We need to be able to have large projects work with the backup. And if there is an error where not all rows are going to show up, there needs to be an error message to the user demonstrating that. The user only realized this was a problem when they double-checked the CSVs against the project data.
Reduce database chunk query to something like 50k
That sounds reasonable.
This query is very light because the only ordering is by PK so it shouldn't need to do much to read this data. It more likely failed because there was a lot of db activity at the same time. We could try some exponential backoff: if a query fails then retry in 10 sec, then 20 sec, then 40 sec then 80 sec else fail completely.
If one of the queries fails then I think the whole job should fail -- we don't want to continue and send the user incomplete data.
@carlybatist We are going to need this week to work on some improvements.
@antonyharfield @grindarius To find the solution for the job fail case.
Email fail status to user and support@rfcx.org/slack
Draft email to notify failure export:
Subject: Arbimon project export failed
Hello,
Thanks so much for using Arbimon! We encountered an issue while backing up your project '...'. Our apologies for the inconvenience. Please contact our support team at [contact@arbimon.org] for assistance.
@antonyharfield @carlybatist Can you check if this message need any changes?
@koonchaya Tech team would be getting this error notification too right? They should then immediately start looking into it as a support ticket. So I would think the email to the user should be informing them that there was an error and that our team is looking into it and will update them. Noon and I should be auto-cc'd on these emails to users too. So it would be --
Hello,
There was an issue with your project backup of '...'. Our engineering team is looking into this and will update you when we have resolved it. We apologize for the inconvenience and thank you for your patience!
All the best, Arbimon team
Ideally, @carlybatist and I would get the email that forwarded from support@rfcx.org. I am not sure about the eng-team will get alert elsewhere.
@koonchaya @grindarius what do you expect the timeline for fixing the underlying issue will be?
I tested export backup from project https://staging.arbimon.org/p/bci-panama-2018/overview @grindarius here is some feedback
@grindarius
rfm_classifications_001.csv rfm_classifications_002.csv rfm_classifications_003.csv rfm_classifications_004.csv rfm_classifications_005.csv rfm_classifications_006.csv rfm_classifications_007.csv rfm_classifications_008.csv
For pattern matchings export, we grab data directly from pattern_matchings
table. What you see in the UI are jobs that are joined with the jobs
table. But in pattern_matchings
table we can get all the jobs from there even some jobs are not related to the jobs
table. Same goes for pattern_matching_rois
.
For playlists export I did find both playlists inside the export file so it's all good.
For rfm models there are deleted models being exported into the file.
Same goes for rfm classifications, we did not have condition to remove deleted classifications out.
@grindarius
Released on v1.4.2
Original report: https://rfcx.slack.com/archives/C03FD1WD02J/p1718028385095719 @carlybatist
User reported that number of recordings and templates exported from the project didn't match data in the project. Project https://arbimon.org/p/biosoundscape/overview
Additional information I exported the files from the project and found that the number of recordings and templates didn't match. Export file https://drive.google.com/file/d/1_l2LmgDrn_CIMDx23Bq5JWIlVhmF15A2/view?usp=sharing