Open nclack opened 1 year ago
I did some more evaluation here and implemented a method to fetch via XML-RPC and fallback to scraping, but in testing I found that the data returned by the XML-RPC API are not quite the same as the current implementation.
The main difference is that the XML-RPC API will return more plugins. The scraping method seems to only find packages whose latest version includes the Framework :: napari
classifier. For example napari-webcam
removed the classifier in v0.1.19, and is not found via scraping. Meanwhile the XML-RPC API will still return the v0.1.18 that had the classifier.
Right now there are to be 16 such packages:
E {'flood-napari': '0.0.1',
E 'napari-czifile': '0.0.2',
E 'napari-imagecodecs': '0.0.2',
E 'napari-lfdfiles': '0.0.2',
E 'napari-mahotas-image-processing': '0.1.2',
E 'napari-manual-split-and-merge-labels': '0.1.3',
E 'napari-netpbmfile': '0.0.2',
E 'napari-oclrfc': '0.4.5',
E 'napari-oiffile': '0.0.2',
E 'napari-sdtfile': '0.0.2',
E 'napari-sim-SIMulator': '0.0.1',
E 'napari-tifffile': '0.0.2',
E 'napari-webcam': '0.1.18',
E 'nucleaizer-backend': '0.2.0',
E 'ome-zarr': '0.0.19',
E 'pyclesperanto-prototype': '0.10.9'}
It seems people are using this as a way to de-list their plugins, so if that's behavior we want to keep I'd recommend keeping the scraping method but adding the minimal change outlined in https://github.com/napari/npe2/pull/255#issuecomment-1332683255 (order by date when scraping).
i will comment more later (busy atm) ... but just wanted to quickly say that there's some prior reason for all these decisions.
It seems people are using this as a way to de-list their plugins
this is why the npe2api uses the bigquery fetch, that is the real source of truth and the "withdrawn" list in the big query results should show those that used to have the classifier but no longer do
Thanks - no rush, I appreciate your time and know I'm not necessarily breaking new ground here (I found some prior discussions between you and PyPI maintainers, for example) but in any case I learned some things.
sorry for the delay here. I see now that this is mostly about testing stability, which is less "consequential" so to speak. I was going to discourage the XML-RPC API for any runtime dependency, for all of the reasons that were mentioned in that thread on the pypi repo that you apparently found (basically, they confirmed that the big query data should be seen as the "ground truth" ... but it has the obvious downsides of query time and permissions, so they recommended the web-scraping as a fallback).
If it's just flaky for tests, I think you should just mock the response altogether. And maybe also, to check that when the http response is valid, that the parsing works. That is, in the test itself, check whether you get a 200 response from pypi.org, and if not, xfail the test (so the CI will still pass), but if you do get a valid response, then let the test proceed to test the parsing logic (preferably in parallel to another test that just mocks the response).
anyway, if it's all for the sake of testing, it obviously doesn't really matter as much. I'd just discourage you from using XML-RPC for runtime plugin discovery
In #255, @aganders3 explored alternative pypi api's. The XML-RPC api looked a bit faster and possibly more reliable but may be deprecated soon.
Originally posted by @aganders3 in https://github.com/napari/npe2/issues/255#issuecomment-1349561325