Closed Daniel-Mietchen closed 10 years ago
Reading http://www.sqlite.org/howtocorrupt.html. Daniel, can you guarantee that never more than one job needing the database did run in parallel?
Well, guarantee … do you suspect that multiple invocations of the Open Access Media Importer did run in parallel?
This and the still unexplained video bug make me suspect server memory may be faulty.
I may not have permissions to check S.M.A.R.T. status of server hard disk. http://en.wikipedia.org/wiki/S.M.A.R.T.
erlehmann@files:~$ /usr/sbin/smartctl --all /dev/disk/by-uuid/45eaa83c-6c05-4631-ba35-88159dd7f8b9 smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net Smartctl open device: /dev/disk/by-uuid/45eaa83c-6c05-4631-ba35-88159dd7f8b9 failed: Permission denied
Raphael should either give me permission to check S.M.A.R.T. status or – probably better, since he probably knows the hardware and I do not – check if there is any filesystem or memory corruption on the server.
No indications of filesystem or memory corruption. I guess that parallel file access is to blame.
RaphaelWimmer, what tests did you do to check for silent data corruption? I should probably introduce safeguards against parallel file access, thx.
It could be possible that multiple instances of the program ran in parallel. The cron job normally takes about 5-8h but every time a conversion stalls, the loop continues only after 6h. So multiple conversion errors in a row could have resulted in one instance of the program still being active when cron ran the next one the next morning.
In fact, when I reported I had stopped the bot on Nov 15, I had only stopped future cron jobs, but uploads continued (sometimes OK, sometimes not) until Nov 17.
To me, this is another indicator that we should split the jobs of crawling, converting and uploading at least optionally, as per https://github.com/erlehmann/open-access-media-importer/issues/95 .
How can we move forward on this one?
Ad hoc: Delete malformed database, ensure cron job kills other instances of OAMI. Long-term: Fix the conversion stalls.
Fixed with ad hoc approach above.
I am using oami_pmc_pmcid_import in the cron job, and here's what I got when I just tried to run it manually. Perhaps that is at the origin of https://github.com/erlehmann/open-access-media-importer/issues/112 ?
I just uploaded about 50 videos via oami_pmc_doi_import , and all of them were fine.