wpoa / open-access-media-importer

A tool for harvesting media files from Open Access articles for upload into Wikimedia Commons
http://commons.wikimedia.org/wiki/User:Open_Access_Media_Importer_Bot
23 stars 8 forks source link

Description should not read "Click here for additional data file." #90

Open Daniel-Mietchen opened 11 years ago

Daniel-Mietchen commented 11 years ago

For a week or so, the bot gets the file descriptions wrong, apparently by mixing up the captions for the article element (Movie S1 in the example below) and for the media file itself (pone.0022479.s004.wmv), yielding "Click here for additional data file."

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=3150374 http://commons.wikimedia.org/w/index.php?title=File:Some-Causes-of-the-Variable-Shape-of-Flocks-of-Birds-pone.0022479.s004.ogv&oldid=100751536

<supplementary-material content-type="local-data" id="pone.0022479.s004">
<label>Movie S1</label>
<caption>
<p>
<bold>Measurement of school shape.</bold>
This movie shows a bounding box around the flock in black. Its dimensions are calculated with the PCA. The the shortest dimension is the height. The flock is clearly asymmetrical or oblong. Simultaneously the movie shows the bounding box for measuring the degree to which the flock is elongated in the movement direction (white).
</p>
<p>(WMV)</p>
</caption>
<media xlink:href="pone.0022479.s004.wmv" mimetype="video" mime-subtype="x-ms-wmv">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
erlehmann commented 11 years ago

There are two caption elements. What is the difference between them? I'll add an assertion that „Click here“ should not be part of a caption at all.

erlehmann commented 11 years ago

The responsible commit is most probably 904a84f3b2f9fb45b877d150d26323a057673e0d.

erlehmann commented 11 years ago

I thought that was a legitimate caption. https://github.com/erlehmann/open-access-media-importer/issues/84#issuecomment-21086653 Should I revert 904a84f3b2f9fb45b877d150d26323a057673e0d?

erlehmann commented 11 years ago

e5e8479e976216ddb8dd0853779c89da21a58b42 now throws an AssertionError when this happens. Leaving this bug open until it is clear if 904a84f3b2f9fb45b877d150d26323a057673e0d should be reverted.

erlehmann commented 11 years ago

Reverted 904a84f3b2f9fb45b877d150d26323a057673e0d as of 7c66e370ea192a4c643adeeb287e2badc8987190.