mwclient / mwclient

Python client library to interface with the MediaWiki API
https://pypi.org/project/mwclient/
MIT License
314 stars 90 forks source link

mwclient 1.0 release #284

Open RheingoldRiver opened 1 year ago

RheingoldRiver commented 1 year ago

Following up from this comment, what should a 1.0 release of mwclient look like?

Things to consider:

One other thing I'd like to consider is that Discord bots are becoming more and more popular, with discord.py and discord.js being the two main libraries. mwclient should be the go-to library for dpy, however, we don't have any async capability making it ill-suited for discord. Maybe we can add async functions for at least login, read, edit? This client does exist and I think it would be great for it to be merged into mwclient proper. I just spoke to @Wadu436 and he's interested in at least discussing the issue, even if not necessarily contributing code. It's possible that it doesn't make sense to combine these into one library; however, if they remain separate projects I think one is more likely to go unmaintained, which I think is unfortunate.

AdamWill commented 1 year ago

I'd say we de facto have "long-term stability of usage" - anyone using mwclient now has to be pretty used to how it works at present =) I guess I wouldn't want to make any overly radical changes. Improving documentation would definitely be a good idea.

I have never written much async stuff and don't have any interest in discord bots myself, so I'm not sure I have a lot of opinions about that side. It seems like aiomwclient is basically....mwclient, only with async def everything? As I said, I'm not that familiar with async, so...how disruptive would it be to merge that kind of thing into mwclient proper, but preserving the synchronous interface for users who have no interest in async usage?

RheingoldRiver commented 1 year ago

One thing to consider regarding stability is that in a 1.0 release, if we retain page.save, we will certainly need to keep it forever - currently both page.save and page.edit do the same thing. IMO there is no question that we should keep both (I've personally always used page.save) but if there's some reason to use edit only, this change should be made in 1.0 I think.

I'm not sure if there's any other similar aliases that exist.

See: https://github.com/mwclient/mwclient/blob/b1eb9772f6c7e3a981b789168e2596ca7820495a/mwclient/page.py#L177

Wadu436 commented 1 year ago

It seems like aiomwclient is basically....mwclient, only with async def everything?

It's been a while since I worked on it, but most of the logic is (or should be) the same. The main difference is that all the HTTP requests are performed using an async library (aiohttp) instead of a sync one (requests).

To have both in one package, there's a few options, the most prominent in my opinion are

  1. Move common logic off into their own synchronous functions, and maintain two versions of every function that performs HTTP requests. They would both call common logic, but perform their requests with either requests or aiohttp (or some other sync/async network libraries)
  2. Move everything to async, and provide a sync interface by running the async functions in an executor for the user (asyncio.run())

Option 1 would likely perform the best as there doesn't have to be a new event loop for every synchronous request. Both would probably perform the same for async calls.

Downside is that you'll need to have async libraries as dependencies (asyncio, aiohttp) even when a user only wants synchronous code.

I'm not super familiar with what Python packages can do, but if they have something like Rust's crate features, you might be able to avoid needing the async dependencies for the sync API.

AdamWill commented 7 months ago

I'm gonna suggest we go ahead and do a 1.0 release with more or less what we have now (we will need to come up with a new way of doing releases, though, as I think the old way was tied to travis; we can use a script like I do for my personal projects, or a GHA workflow maybe). I don't think it makes sense to delay a new release to meet all the 'gold standard' requirements - though we should still work towards that! - and I don't think it makes any sense for the project to be versioned 0.x, really. Practically speaking this is a stable library that downstreams have been relying on for years (I'm one of them). 1.x versioning would be appropriate.

I agree with @RheingoldRiver that we should keep page.save and page.edit and just accept that both will exist forever. It's fine, there's no shortage of verbs. :)

AdamWill commented 7 months ago

Similarly on the async stuff, I don't think it should hold up doing a 1.x release now. I think we can definitely look at it, though. I'll file a separate ticket for that.

RheingoldRiver commented 7 months ago

I'm out of town at the moment, so my involvement will be quite low for the next few weeks, but agreed with all the above!

AdamWill commented 3 months ago

Heads up on this for other maintainers @waldyrious @btongminh @marcfrederick - I've finally got a bit of spare time and am trying to set up a release workflow now following the guide at https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/ . You'll probably see some notifications from pypi and github related to this.

AdamWill commented 3 months ago

https://github.com/mwclient/mwclient/pull/331 adds a release workflow.

AdamWill commented 2 months ago

OK, I just tested the release workflow using testpypi and a test repo - https://github.com/AdamWill/mwclient-test , https://test.pypi.org/manage/project/mwclient/releases/ . as you can see, it seems to work fine.

So, I propose to do a v1.0.0 release later today by just doing bump-my-version bump major and pushing the resulting tag. That should automatically produce a pypi and github release.

Do any other maintainers think we're not ready yet? @mwclient/maintainers

RheingoldRiver commented 2 months ago

Sorry I've been a bit busy, but is #259 addressed? I would propose to introduce chunk_size as an alias for limit, expose max_items, and deprecate limit not necessarily with the goal of removing it, but so that we have a clear warning that limit isn't what it seems to be

AdamWill commented 2 months ago

oh, no, I don't believe we did anything about that. I haven't looked at it in detail but your proposal sounds sensible, do you want to send a PR?

RheingoldRiver commented 2 months ago

yep, I should be able to within the next couple days!

AdamWill commented 2 months ago

Looking at the milestone, there's also https://github.com/mwclient/mwclient/issues/197 .

RheingoldRiver commented 2 months ago

Sounds good to change it but not sure what I think we should change it to. I think it might be the least confusing to accept **kwargs and send that on to the request, but if we wanted to keep it a named param what about extra ? What do you think?

AdamWill commented 2 months ago

let's discuss it over in that issue.

RheingoldRiver commented 2 months ago

Okay I've started this and it's quite a bit more complex a refactor than I'd anticipated. I'm not sure I feel comfortable doing this right before a major release, what do you think?

AdamWill commented 2 months ago

@RheingoldRiver you mean https://github.com/mwclient/mwclient/issues/259 ? Hmm...maybe we can just add a deprecation warning to limit saying its replacement will be determined in future? Or just the "rename it to chunk_size and deprecate it" bit? What exactly is the complex bit?

RheingoldRiver commented 2 months ago

259 yeah - the complex bit is that limit is referenced in dozens of different functions and is probably passed as a kwarg on at least one occasion. Once I started making the change I realized I'd be touching a lot of code to cover everywhere that it's used.

AdamWill commented 2 months ago

oh, yeah, hmm, I see...there are even a couple of places (random and exturlusage at least) where we pass a specific value which looks like we're expecting it to work the same way the OP thought it did. random() definitely does not work as documented:

>>> for pg in wk.random("QA", limit=2):
...     print(pg)
... 
OrderedDict({'id': 21941, 'ns': 0, 'title': 'SystemConfig/bind/review'})
OrderedDict({'id': 36951, 'ns': 114, 'title': 'Test Day:2010-10-14 OpenLDAP/NSS'})
OrderedDict({'id': 13590, 'ns': 6, 'title': 'File:Artwork FC7Themes Fc7ThemeProposalFlyingHighRound2 kdeScreenShot.png'})
OrderedDict({'id': 51713, 'ns': 6, 'title': 'File:F20-7.jpg'})
OrderedDict({'id': 31068, 'ns': 1, 'title': 'Talk:Features/Sugar 0.88'})
OrderedDict({'id': 46177, 'ns': 2, 'title': 'User:Rajan2251'})
OrderedDict({'id': 85143, 'ns': 116, 'title': 'Test Results:Fedora 37 RC 1.4 Installation'})
OrderedDict({'id': 9941, 'ns': 0, 'title': 'Infrastructure/RFR/FedoraMagazine'})
OrderedDict({'id': 12478, 'ns': 6, 'title': 'File:Artwork MarketingCollateral poweredby sticker mono.png'})
OrderedDict({'id': 51878, 'ns': 116, 'title': 'Test Results:Fedora 20 Beta TC4 Desktop'})
OrderedDict({'id': 3480, 'ns': 0, 'title': 'QA/TestCases/BootMethodsInstallFromBootIsoIfs'})
OrderedDict({'id': 15489, 'ns': 2, 'title': 'User:Tw2113'})
OrderedDict({'id': 27371, 'ns': 2, 'title': 'User:Melan'})
OrderedDict({'id': 84867, 'ns': 6, 'title': 'File:Mairacanal.jpg'})
OrderedDict({'id': 7013, 'ns': 0, 'title': 'L10N Kannada Team'})
OrderedDict({'id': 29037, 'ns': 6, 'title': 'File:Renderchecklog-plouj-radeon.gz'})
OrderedDict({'id': 24458, 'ns': 14, 'title': 'Category:Ambassadors from Kyrgyzstan'})
OrderedDict({'id': 79496, 'ns': 2, 'title': 'User:Lruzicka/Testcase switch stream'})
OrderedDict({'id': 57468, 'ns': 0, 'title': 'Fedora 23 talking points'})
OrderedDict({'id': 8169, 'ns': 0, 'title': 'FirstBoot'})
...

that's a lot more than 2! Same applies to page.revisions(), and several other places, I think.

So on the one hand this definitely looks like a big change, OTOH it does kinda seem like something it would be best to fix before doing a 1.0 release...

It does kinda look like almost all the usages follow a pattern of taking a limit arg and passing it through to something from listing. So we could probably write a little helper to handle the deprecation for all of them. On the whole...it would definitely be a big commit but I think I'd lean towards getting it done before 1.0. Any thoughts from the rest of @mwclient/maintainers ?

RheingoldRiver commented 2 months ago

How about making this change to limit, and then releasing mwclient 0.11.0? Then we could aim for a 1.0 release around the time that mediawiki 1.43 happens, with no changes planned for that milestone unless we need to fix something from 0.11.0. That should give enough time to test the changes widely before we announce 1.0

AdamWill commented 2 months ago

Sure. We could also call it something like 0.90.0 to make it clear it's an RC, but I'm bikeshedding :) Would also shake out any non-obvious issues in the new release pipeline too, I guess.

marcfrederick commented 2 months ago

Yeah, I'd also be in favor of releasing v0.11.0 before v1.0.0. It's been more than three years and almost 50 merges (1,088 additions and 270 deletions across 32 files) since the release of v0.10.1, so there's a good chance a bug or two will surface once all those changes are released. If all goes well we can then release that version more or less unchanged as v1.0.0.

AdamWill commented 1 month ago

So, there are more complications! We have two large PRs that sort of conflict: https://github.com/mwclient/mwclient/pull/336 adds type annotations to the entire codebase, https://github.com/mwclient/mwclient/pull/299 adds user management (but doesn't have type annotations). I do not have a ton of time to work on these, as you can tell from how they're lying around.

We still don't have the limit change finished - @RheingoldRiver , are you still working on that? If not I'm happy to go ahead and send a PR with one of my suggestions (probably the first one, as it's easier).

I think I'd be in favor of doing a 0.11.0 with that change if possible but not with either of the others, then trying to land those two (and anything else) before a 0.90.0 / 1.0.0 release. Thoughts? I've set up a 0.11.0 milestone and split the harder issues into a 1.0.0 milestone (including a couple old ones Dan tagged for 0.9.1 but never fixed, AFAICS).

marcfrederick commented 1 month ago

Sure, sounds like a good plan. Let's do the v0.11.0 release and then merge #299 (user management) before #336 (type annotations). After #299 lands, I will rebase and update the PR with the limit and user management-related changes.

We could even delay the type annotations until after the release of v1.0.0, getting that release out first and then releasing the annotations as v1.1.0. Adding the annotations is arguably not a breaking change, as it won't break any existing code. Worst case, some editors might show errors/warnings, but there will be no runtime changes.

AdamWill commented 1 month ago

sure, strictly speaking that makes sense, I guess it's just that anything that touches every file in the project (more or less) "feels" like a 1.0 change to me! But logically speaking that might be the way to go indeed.

RheingoldRiver commented 1 month ago

@AdamWill I haven't been working on the limit change, and I'd rather not tbh - that one change feels like it touches every single part of the codebase, and I don't have a good enough mental map of how the Listing classes work to be comfortable doing that change.

AdamWill commented 1 month ago

OK, I'll send a PR for my first idea then and we can see what folks think of that. I think it's kinda the 'safest' approach.

RheingoldRiver commented 1 month ago

Speaking of PRs, I think #327 and #338 both seem good to me and we can merge

AdamWill commented 2 weeks ago

OK, so we merged the limit thing, and it would be good to get the writeapi fix out. @mwclient/maintainers , does anyone have an objection to doing a 0.11.0 release now? I will let this sit for a day and try to do a release tomorrow if nobody can think of a reason why not. Thanks!

RheingoldRiver commented 2 weeks ago

I'm good with a release tomorrow, I'll start using it immediately and report if I find any errors 👍