wpoa / open-access-media-importer

A tool for harvesting media files from Open Access articles for upload into Wikimedia Commons
http://commons.wikimedia.org/wiki/User:Open_Access_Media_Importer_Bot
23 stars 8 forks source link

database disk image is malformed #113

Closed Daniel-Mietchen closed 10 years ago

Daniel-Mietchen commented 10 years ago

I am using oami_pmc_pmcid_import in the cron job, and here's what I got when I just tried to run it manually. Perhaps that is at the origin of https://github.com/erlehmann/open-access-media-importer/issues/112 ?

I just uploaded about 50 videos via oami_pmc_doi_import , and all of them were fine.

danielmietchen@files:~/open-access-media-importer$ echo 3816174 | ./oami_pmc_pmcid_import
Traceback (most recent call last):
  File "./oa-get", line 43, in <module>
    setup_all(True)
  File "/usr/lib/pymodules/python2.7/elixir/__init__.py", line 98, in setup_all
    create_all(*args, **kwargs)
  File "/usr/lib/pymodules/python2.7/elixir/__init__.py", line 76, in create_all
    md.create_all(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/schema.py", line 2564, in create_all
    tables=tables)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2303, in _run_visitor
    conn._run_visitor(visitorcallable, element, **kwargs)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1973, in _run_visitor
    **kwargs).traverse_single(element)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/sql/visitors.py", line 106, in traverse_single
    return meth(obj, **kw)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/ddl.py", line 54, in visit_metadata
    if self._can_create_table(t)]
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/ddl.py", line 32, in _can_create_table
    table.name, schema=table.schema)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/dialects/sqlite/base.py", line 606, in has_table
    cursor = _pragma_cursor(connection.execute("%stable_info(%s)" % (pragma, qtable)))
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1449, in execute
    params)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1628, in _execute_text
    statement, parameters
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1698, in _execute_context
    context)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1691, in _execute_context
    context)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 331, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DatabaseError: (DatabaseError) database disk image is malformed 'PRAGMA table_info("model_category")' ()
erlehmann commented 10 years ago

Reading http://www.sqlite.org/howtocorrupt.html. Daniel, can you guarantee that never more than one job needing the database did run in parallel?

erlehmann commented 10 years ago

Well, guarantee … do you suspect that multiple invocations of the Open Access Media Importer did run in parallel?

erlehmann commented 10 years ago

This and the still unexplained video bug make me suspect server memory may be faulty.

erlehmann commented 10 years ago

I may not have permissions to check S.M.A.R.T. status of server hard disk. http://en.wikipedia.org/wiki/S.M.A.R.T.

erlehmann@files:~$ /usr/sbin/smartctl --all /dev/disk/by-uuid/45eaa83c-6c05-4631-ba35-88159dd7f8b9
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Smartctl open device: /dev/disk/by-uuid/45eaa83c-6c05-4631-ba35-88159dd7f8b9 failed: Permission denied

Raphael should either give me permission to check S.M.A.R.T. status or – probably better, since he probably knows the hardware and I do not – check if there is any filesystem or memory corruption on the server.

RaphaelWimmer commented 10 years ago

No indications of filesystem or memory corruption. I guess that parallel file access is to blame.

erlehmann commented 10 years ago

RaphaelWimmer, what tests did you do to check for silent data corruption? I should probably introduce safeguards against parallel file access, thx.

Daniel-Mietchen commented 10 years ago

It could be possible that multiple instances of the program ran in parallel. The cron job normally takes about 5-8h but every time a conversion stalls, the loop continues only after 6h. So multiple conversion errors in a row could have resulted in one instance of the program still being active when cron ran the next one the next morning.

In fact, when I reported I had stopped the bot on Nov 15, I had only stopped future cron jobs, but uploads continued (sometimes OK, sometimes not) until Nov 17.

To me, this is another indicator that we should split the jobs of crawling, converting and uploading at least optionally, as per https://github.com/erlehmann/open-access-media-importer/issues/95 .

Daniel-Mietchen commented 10 years ago

How can we move forward on this one?

erlehmann commented 10 years ago

Ad hoc: Delete malformed database, ensure cron job kills other instances of OAMI. Long-term: Fix the conversion stalls.

Daniel-Mietchen commented 10 years ago

Fixed with ad hoc approach above.