mediawiki-client-tools / mediawiki-dump-generator

Python 3 tools for downloading and preserving wikis
https://github.com/mediawiki-client-tools/mediawiki-scraper
GNU General Public License v3.0
89 stars 14 forks source link

Uploader: `--append_date` causing uploads to multiple IA items #139

Closed yzqzss closed 1 year ago

yzqzss commented 1 year ago

Every time a file start to upload (for dump in dumps), a new item instance is created.

https://github.com/mediawiki-client-tools/mediawiki-scraper/blob/934161b6f9f987250ef656f66abeb35c97c203e2/wikiteam3/uploader.py#L110-L113

When the first instance is created, the item's identifier (without date suffix) doesn't exist yet, so --append_date doesn't work. Once the first file is uploaded and the item exists, --append_date causes the remaining files to be added to the identifier with date suffix.

Therefore, if wikidump is being uploaded to IA for the first time and --append_date is used, it will cause the files to be accidentally uploaded to two different items.


Example:


134

https://github.com/WikiTeam/wikiteam/pull/424

@simonliu99