gpaumier / MrMetadata

A script and dashboard to identify and fix images on Wikimedia sites without machine-readable metadata
MIT License
4 stars 0 forks source link

Script crashes for some wikis while printing first page: 'tallies' is undefined #4

Closed gpaumier closed 9 years ago

gpaumier commented 9 years ago

Example stacktrace:

Traceback (most recent call last):
  File "../pywikibot/pwb.py", line 178, in <module>
    run_python_file(fn, argv, argvu)
  File "../pywikibot/pwb.py", line 75, in run_python_file
    exec(compile(source, filename, "exec"), main_mod.__dict__)
  File "mrmetadata.py", line 827, in <module>
    main(arguments)
  File "mrmetadata.py", line 37, in main
    check_local_uploads(args.family, args.prefix)
  File "mrmetadata.py", line 185, in check_local_uploads
    output_site_page(output_directory, page_number, current_site, files_with_missing_mrd[:NUMBER_OF_FILES_PER_PAGE], NUMBER_OF_FILES_PER_PAGE)
  File "mrmetadata.py", line 764, in output_site_page
    html_output = template.render(template_params)
  File "/home/gpaumier/tools/MrMetatata/lib/python2.7/site-packages/jinja2/environment.py", line 969, in render
    return self.environment.handle_exception(exc_info, True)
  File "/home/gpaumier/tools/MrMetatata/lib/python2.7/site-packages/jinja2/environment.py", line 742, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "templates/site_page.html", line 1, in top-level template code
    {% extends "base.html" %}
  File "templates/base.html", line 59, in top-level template code
    {% block content %}{% endblock %}
  File "templates/site_page.html", line 29, in block "content"
    style="width: {{ tallies['percentage_ok'] }}%">
  File "/home/gpaumier/tools/MrMetatata/lib/python2.7/site-packages/jinja2/environment.py", line 378, in getitem
    return obj[argument]
jinja2.exceptions.UndefinedError: 'tallies' is undefined

This notably happens (at the moment) when running the script on fa.wikipedia.

gpaumier commented 9 years ago

'tallies' is undefined happens because (for some reason) we're trying to print a non-first page using the first-page template.

The problem seems to be caused by an issue around line 165

Debugging output from around those lines while running the script on fa.wikipedia:

…Checking metadata for 50 pages …Checking metadata for 19 pages files_to_print_on_the_first_page (line 166): 251 line 168: skipped files_to_print_on_the_first_page (line 173): 251 files_with_missing_mrd (line 173): 612 Outputting page: 1

gpaumier commented 9 years ago

After a batch is done, we seem to save the files_with_missing_mrd for the first page even if it's not a full page (line 183), which doesn't make sense. We should do this only when not another_batch_is_coming (which we do at line 200). The first occurrence may be the cause of the issue. Currently testing that hypothesis.

gpaumier commented 9 years ago

So, the good news is that removing that line 183 fixes the crash and outputs the second and first page correctly.

The bad news is that there's a discrepancy between the number of files listed on the pages and the number of files missing MRD from the tallies (a pretty big one).

gpaumier commented 9 years ago

Although, now that I check a few other examples, the discrepancy seems not to be limited to this instance, so I'm going to open a separate issue for this.