trappedinspacetime / wikiteam

Automatically exported from code.google.com/p/wikiteam
0 stars 0 forks source link

http://es.wikineos.com fails downloading all images #19

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
it only downloads first 50 images from http://es.wikineos.com

Original issue reported on code.google.com by emi...@gmail.com on 9 Jul 2011 at 6:21

GoogleCodeExporter commented 8 years ago
I found some code to correct in the script, but this is a MediaWiki/server 
issue. I duplicated the error with r199 (latest available) of dumpgenerator.py

 dumpgenerator.py --api=http://es.wikineos.com/w/api.php --images

getImageFilenamesURL exclusively uses index.php, when api.php would be a better 
way to get the file list and completely avoid this issue (of course, we still 
want it to work with index.php, too, so I will continue...)

the regex "r_next" targets the "prev" button, when it should target the "next" 
button (moving from newest to oldest). changed r_next:
 from: r'(?<!&dir=prev)&offset=(?P<offset>\d+)&'
 to: r'offset=(?P<offset>\d+).*rel="next"'

This helps it get past the first page of files. It continues page-by-page until 
it hits a large image (13802x2880 pixels) that freezes it up:

* around offset=20100803222229
* http://es.wikineos.com/wiki/Archivo:VistaCerredo-2.jpg (doesn't load)
* http://es.wikineos.com/images/2/2e/VistaCerredo-2.jpg
* this file was uploaded at 2010-08-03T16:32:38Z (20100803163238)

Even with limit=1 the error occurs. Here's a sequence of the files surrounding 
the failed image:
* 
http://es.wikineos.com/w/index.php?title=Special:ListFiles&offset=20100830172528
&limit=1 (OK)
* 
http://es.wikineos.com/w/index.php?title=Special:ListFiles&offset=20100803222229
&limit=1 (fail)
* 
http://es.wikineos.com/w/index.php?title=Special:ListFiles&offset=20100803163238
&limit=1 (OK)

I have no idea how to use SVN, so I've attached a modified version of 
dumpgenerator.py for you to review, emijrp. Changes:
* new regex "r_next" in "getImageFilenamesURL"
* added a troubleshooting "print" in "getImageFilenamesURL" (should be disabled 
in commit)

Original comment by griffin....@gmail.com on 9 Jul 2011 at 10:43

Attachments:

GoogleCodeExporter commented 8 years ago
ah, figured out how to create an SVN .patch file

Original comment by griffin....@gmail.com on 10 Jul 2011 at 12:49

Attachments:

GoogleCodeExporter commented 8 years ago
Wouldn't this be fixed by #22 as well?

Original comment by nemow...@gmail.com on 29 Feb 2012 at 11:26

GoogleCodeExporter commented 8 years ago

Original comment by nemow...@gmail.com on 29 Feb 2012 at 11:41

GoogleCodeExporter commented 8 years ago
I can't reproduce this issue (using r709). Full process of the dump: 
http://pastebin.com/Z9x7wGMJ

Original comment by ad...@alphacorp.tk on 22 Jun 2012 at 8:05

GoogleCodeExporter commented 8 years ago

Original comment by ad...@alphacorp.tk on 22 Jun 2012 at 10:02

GoogleCodeExporter commented 8 years ago
And now their API is inaccessible, weird way to disable this API module:

Fatal error: require() [function.require]: Failed opening required 
'/home/wikineos/public_html/es/w/includes/api/ApiQueryAllImages.php' 
(include_path='.:/usr/lib/php:/usr/local/lib/php') in 
/home/wikineos/public_html/es/w/includes/AutoLoader.php on line 1145

Original comment by nemow...@gmail.com on 8 Nov 2013 at 10:12

GoogleCodeExporter commented 8 years ago
The offending file has been deleted, I expect it will now work for that wiki 
and there's little we can do for the general case. Reopen if you feel something 
else can/should be done.

Original comment by nemow...@gmail.com on 31 Jan 2014 at 3:10