openzim / mwoffliner

Mediawiki scraper: all your wiki articles in one highly compressed ZIM file
https://www.npmjs.com/package/mwoffliner
GNU General Public License v3.0
292 stars 74 forks source link

[REGRESSION] Renderer tests should have a delay between each invocation. #1937

Closed VadimKovalenkoSNF closed 1 year ago

VadimKovalenkoSNF commented 1 year ago

After https://github.com/openzim/mwoffliner/pull/1933 ( Apply test coverage for all endpoins ) has been merged, a regression with exceeded rate limit appeared while checking VisualEditor capability. At the moment of speaking, this endpoint (https://en.wikipedia.org/w/api.php?action=visualeditor&mobileformat=html&format=json&paction=parse&formatversion=2&page=MediaWiki%3ASidebar) has the response for all clients:

{
"error": {
  "code": "parsoid-stash-rate-limit-error",
  "info": "Stashing failed because rate limit was exceeded. Please try again later.",
  "docref": "See https://en.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."
},
  "servedby": "mw1491"
}

There is no Retry-After header in the response and the response status itself is 200. A possible solution for this is to add a delay between tests.

More info about MW Action API rate limits can be found in https://phabricator.wikimedia.org/T172293#6344271 and https://www.mediawiki.org/wiki/API:Etiquette#Request_limit

VadimKovalenkoSNF commented 1 year ago

Phab ticket is here - https://phabricator.wikimedia.org/T350117

MatmaRex commented 1 year ago

(I'm missing the context, I'm just here because of https://phabricator.wikimedia.org/T350117)

I would suggest migrating your use of the action=visualeditor action API to the new MediaWiki REST API.

The visualeditor action API is considered internal, and while we don't make breaking changes for no reason (since others, just like you, have started using it), we do not test how changes affect third-parties. It is slower because it produces metadata needed by the visual editor, but probably not needed by you.

The new MediaWiki REST API can be used like this: https://en.wikipedia.org/w/rest.php/v1/page/MediaWiki%3ASidebar/html. It's stable, intended for external use, has simpler output, and is faster. The output is identical to the .content property in the visualeditor action API. It is also available on all third-party wikis running MediaWiki 1.35 or newer.

(It is distinct from the older Wikimedia REST API, which was not available outside of Wikimedia wikis, unless they went to great pains to install RESTBase. I see you used that API too in your code in wikimedia-desktop.renderer.ts, and you can probably easily adapt this code to use the newer API. Their outputs are almost identical.)

More about the various APIs: https://www.mediawiki.org/wiki/API:REST_API#API_comparison

kelson42 commented 1 year ago

(I'm missing the context, I'm just here because of https://phabricator.wikimedia.org/T350117)

I would suggest migrating your use of the action=visualeditor action API to the new MediaWiki REST API.

VisualEditor and new Mediawiki REST API are different APIs. We are currently working on supporting the new Mediawiki REST API, see #1601. That said, that does not mean we should get rid now of the old VisualEditor API.

The visualeditor action API is considered internal,

Yes, unfortunately... But the Mediawiki REST API is still really too new to be supported on all Mediawiki instances.

and while we don't make breaking changes for no reason (since others, just like you, have started using it), we do not test how changes affect third-parties. It is slower because it produces metadata needed by the visual editor, but probably not needed by you.

I can not say if we need all the metadata, but we definitely need the enriched/semantic HTML. Actually, this is because the VE has needed that API, that we have been able to launch MWoffliner.

The new MediaWiki REST API can be used like this: https://en.wikipedia.org/w/rest.php/v1/page/MediaWiki%3ASidebar/html. It's stable, intended for external use, has simpler output, and is faster. The output is identical to the .content property in the visualeditor action API. It is also available on all third-party wikis running MediaWiki 1.35 or newer.

Yes, that said, this is not the ideal solution either. Main problem being that this is not a mobile output. But I would be surprised we won't discover new/other problems once implemented.

(It is distinct from the older Wikimedia REST API, which was not available outside of Wikimedia wikis, unless they went to great pains to install RESTBase. I see you used that API too in your code in wikimedia-desktop.renderer.ts, and you can probably easily adapt this code to use the newer API. Their outputs are almost identical.)

The best API was mobile-section... unfortunately this is going to be deprecated :(

More about the various APIs: https://www.mediawiki.org/wiki/API:REST_API#API_comparison

kelson42 commented 1 year ago

@VadimKovalenkoSNF I guess this bug is not valid anymore as the bug has been fixed upstream at Wikimedia (see #1945)

VadimKovalenkoSNF commented 1 year ago

I guess this bug is not valid anymore as the bug has been fixed

yes.