mediawiki-client-tools / mediawiki-dump-generator

Python 3 tools for downloading and preserving wikis
https://github.com/mediawiki-client-tools/mediawiki-scraper
GNU General Public License v3.0
89 stars 14 forks source link

wikiteam3 v4 release #176

Closed yzqzss closed 11 months ago

yzqzss commented 11 months ago

Fixes

feats

refactor

countless ...

drop legacy code

Special:Export

Breaking changes

Drop launcher

Shifts compression responsibilities from the launcher to the uploader.

dependencies

refactor uploader


https://pypi.org/project/wikiteam3/

yzqzss commented 11 months ago

Point of Conflict.

https://github.com/mediawiki-client-tools/mediawiki-scraper/commit/eeab93a4888491dcf6d818b93e13968b4a27d152

randomnetcat commented 11 months ago

Several things:

elsiehupp commented 11 months ago

This is the sort of thing where I will defer to everyone else.

In general I would say that anything breaking backwards compatibility should be dependent on implementation of build-versioning that would allow users to reliably target older versions.

Right now this repository does not even have version-tagged GitHub builds, let alone versioned PyPI builds, so at this point breaking backwards compatibility is a no-go.

As for drastically refactoring the code… that's fine, and probably for the better, as long as there is build-versioning in place to protect existing users.

elsiehupp commented 11 months ago

Regarding format compatibility: I think introducing a new default format is fine as long as the existing upstream format continues to be supported alongside it for a substantial "bridge" period, with (a) an ability to convert existing dumps to the new format, and (b) there are strong "deprecation" nudges encouraging users to migrate.

Refactoring (and abstracting much of the backend) could of course facilitate this, hence why I'm supportive of refactoring more generally.

Basically what I'm saying is that introducing a new data format should be dependent on first establishing a stable public API for the backend, which currently does not exist.

robkam commented 11 months ago

This is too complex for my level of comprehension - except that this PR for wikiteam3 is not in that repository.

elsiehupp commented 11 months ago

Here's my thought…

@yzqzss why don't you open a new Pull Request from an earlier commit on this branch? (I think you have to create a branch from that commit in order to do so.)

This would be much easier to approach if it wasn't a gigantic total total rewrite all at once, and breaking it into chunks this way would help.

If you're not interested in doing thing, to be fair, we could try and do so ourselves, but you're more familiar with your own code than we are.