truncated API response for "allrevisions" causes infinite loop

Description of the issue

In an API request is over some threshold and you are using the API:Allrevisions API, MediaWiki truncates the data and does not include any revision data/metadata (i.e., it simply includes an empty list of revisions). This seems to be a special case of [this MediaWiki bug])https://phabricator.wikimedia.org/T86611) although I've not seen any reference to it in this particular case online.

The page content seems to be 8.5MB and the API limit is a bit less than that. It seems like a spam edit.

This is the JSON version of the API response:

{
  "batchcomplete": "",
  "continue": {
    "arvcontinue": "20210219051441|2674861",
    "continue": "-||userinfo"
  },
  "warnings": {
    "result": {
      "*": "This result was truncated because it would otherwise be larger than the limit of 8,388,608 bytes."
    },
    "main": {
      "*": "Subscribe to the mediawiki-api-announce mailing list at  for notice of API deprecations and breaking changes."
    },
    "allrevisions": {
      "*": "Because \"arvslots\" was not specified, a legacy format has been used for the output. This format is deprecated, and in the future the new format will always be used."
    }
  },
  "query": {
    "allrevisions": [],
    "userinfo": {
      "id": 45359,
      "name": "Benjamin Mako Hill"
    }
  }
}

Because the API has not returned any revisions, the value to the arvcontinue slug that is included in the data returned by the API does not change. As a result, dumpgenerator.py assumes that everything is well and stores all the data returned (i.e., nothing) and makes the same request again. This will then happen and over and again until a user intervenes.

How to fix this

At a minimum I think it should notice that we're repeatedly seeing the same continuation for repeated subsequent "successful" (200) requests and then error out. Maybe we want to add the added stipulation that the data is empty?

A more bold approach would involve munging the continuation to add one or something else? I could imagine why we might not want to support this in the tool though. This appears to work in the specific case above but might not work in general.

I manually worked around it by removing the "content" from the arvprop parameter manually (for just this single request), handcrafting XML for that single <page>, concatenating it to the results, and then restarting. Doing something like this automatically is definitely possible, but I'm not sure it's either worth it or a good idea.

I'm happy to help coding something up to fix this but I'm honestly not sure what the best way to approach this would be.

mediawiki-client-tools / mediawiki-dump-generator

truncated API response for "allrevisions" causes infinite loop #166

Description of the issue

How to fix this