Closed yzqzss closed 1 year ago
It would probably be better to split this into smaller PRs?
Also, the change to the on-disk format to make it inconsistent with the original wikiteam tools seems perhaps not ideal (though I don't know for sure whether that's already been done separately in this fork).
It would probably be better to split this into smaller PRs?
OK, I'll do that later.
Also, the change to the on-disk format to make it inconsistent with the original wikiteam tools seems perhaps not ideal (though I don't know for sure whether that's already been done separately in this fork).
Writing size
and sha1
(they are from API/index.php) to images.txt is necessary to make image/file downloads more stable, to support checking file integrity and to support incremental downloads (incremental downloads are a later work in progress).
images.txt
:
before: FileName\tFileURL\tUploader
after: FileName\tFileURL\tUploader\tSize\tSha1
wikiteam3 seems to have made some formatting changes, such as changing config.txt to config.json. This PR change prevented wikiteam3 from running --resume --images on <0.5.0-alpha Dump, but I think it was worth it, after all the previous version even saved 404/502/429 etc. error response as normal files...
wikiteam3 seems to have made some formatting changes, such as changing config.txt to config.json. This PR change prevented wikiteam3 from running --resume --images on <0.5.0-alpha Dump, but I think it was worth it, after all the previous version even saved 404/502/429 etc. error response as normal files...
IIRC the reason I changed the config file to JSON is that using a standardized, well-supported configuration format like JSON simplifies the client code and makes it easier to debug. I don't think I changed much other than having it use JSON, and Python JSON just serializes and de-serializes the config dictionary as-is.
Regarding versioning: version numbers only really make a difference with dependency management, once we publish this on PyPI. (The README currently instructs users to force-overwrite the install, making version increments irrelevant.)
I originally intended Issue 7 and the prepare-for-publication
branch to serve the purpose of getting things ready for PyPI, though I got a bit sidetracked along the way, and the branch is now wildly out of date.
Since PyPI distribution would make Wikiteam3 useful to a lot more people, I do think it's a good medium-term goal, if you'd like to discuss it on Issue 7.
It would probably be better to split this into smaller PRs?
I've split this PR: #66 #67 #68 #69 #70
30176d5 and 5edd3ac are dependent on them and need to wait for these PRs to be merged before making new PRs for both of them.
to_stdout
parameter tologerror()
for printing errors to stdout (0ce1d8820e6013807d0e187941047e55c4ea730f)requests.session
to use--retries
value (0e317e213f6ca55964a255f9a9437086ebffa6af)sha1file()
(e8adcbef151693dc6a2d2e1c3a1c10d74517f933)if len(r.content) == size
start
param fromgeneratorImageDump()
start
for resuming anymore.