wikimedia-gadgets / twinkle

The English Wikipedia twinkle javascript helper
http://en.wikipedia.org/wiki/Wikipedia:Twinkle
Other
135 stars 151 forks source link

speedy: sometimes Twinkle erroneously deletes most of the CSD log #1858

Open NovemLinguae opened 1 year ago

NovemLinguae commented 1 year ago

Reported by Schminnte in Discord

https://en.wikipedia.org/w/index.php?title=User:Schminnte/CSD_log&diff=prev&oldid=1172877712

This CSD log is very large. Perhaps that is a factor.

image
TollensWP commented 2 months ago

If "most of the CSD log" is everything in the log except the newest log entry and the intro text at the top of the page, I think this might be the same issue as #1534 – I would expect this if getPageText() returned undefined because of this line: https://github.com/wikimedia-gadgets/twinkle/blob/d841a36c773a93e6c7ebec0c497963d6e1b736d3/morebits.js#L5195 It also seems to make sense that this would happen in two pages in separate tabs in very short succession if this was the same issue: in the other issue above, the diff (https://en.wikipedia.org/w/index.php?title=Wikipedia:Usernames_for_administrator_attention&diff=prev&oldid=1075817813&diffmode=source) has the blanking edit occur in the next second to the previous edit.

I can't seem to manage to have getPageText() return undefined, though. It almost seems like somehow the response from the API must be out of whack, since the only relevant place where the value returned from getPageText() (ctx.pageText) is set away from its original value of null is this block: https://github.com/wikimedia-gadgets/twinkle/blob/d841a36c773a93e6c7ebec0c497963d6e1b736d3/morebits.js#L3844-L3862 I have no idea why the API might leave .content undefined, though, and I'm not able to reproduce that by automatically making an edit and then the request that Twinkle makes for the content a short while (~500ms seems to be roughly the correct timing on my machine) later, it just gets either the old or new content of the page perfectly fine. The strangest part is that the API clearly did give a response that at least looked mostly normal, or else accessing those properties would have thrown an error and the callback making the edit wouldn't have been run. Perhaps I'm missing something and somehow ctx.pageText is getting overwritten with undefined somewhere between the two, but I can't see anywhere that could be possible.

If "most of the CSD log" is actually just something bizarre like, say, half the entries, these might not be related at all. I can't view that diff anymore since the page has been deleted, and I can't seem to find any relevant edits in the metadata using https://en.wikipedia.org/wiki/User:SD0001/deleted-metadata-link. Would you be able to confirm what that edit looked like, @NovemLinguae?

NovemLinguae commented 2 months ago

Confirming the edit is surprisingly hard because Special:Undelete doesn't accept &revid= or &oldid= or anything like that. For some strange reason it uses timestamp. And this particular deletion has 5,000 revisions. Maybe we should just close this one as too much work to investigate, and wait for an easier test diff.

https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=deletedrevisions&revids=1172877712&formatversion=2

{
    "batchcomplete": true,
    "query": {
        "pages": [
            {
                "ns": 2,
                "title": "User:Schminnte/CSD log",
                "missing": true,
                "deletedrevisions": [
                    {
                        "revid": 1172877712,
                        "parentid": 1172877709,
                        "minor": false,
                        "user": "Schminnte",
                        "timestamp": "2023-08-29T21:57:56Z",
                        "comment": "Logging speedy deletion nomination of [[:Talk:The Daft Punk Series]]."
                    }
                ]
            }
        ]
    }
}

https://en.wikipedia.org/w/index.php?title=Special:Undelete&target=User%3ASchminnte%2FCSD+log&timestamp=20230829215756

This is a log of all [[WP:CSD|speedy deletion]] nominations made by this user using [[WP:TW|Twinkle]]'s CSD module.

If you no longer wish to keep this log, you can turn it off using the [[Wikipedia:Twinkle/Preferences|preferences panel]], and nominate this page for speedy deletion under [[WP:CSD#U1|CSD U1]].

=== August 2023 ===
# [[:Talk:The Daft Punk Series]]: [[WP:CSD#G8|CSD G8]] ({{tl|db-talk}}) 21:57, 29 August 2023 (UTC)

I don't even have an easy way to see if this edit is tagged with Twinkle or not :(

TollensWP commented 2 months ago

I've managed to query the API for tag information also: https://en.wikipedia.org/w/api.php?action=query&prop=deletedrevisions&titles=User%3ASchminnte%2FCSD%20log&drvprop=user|timestamp|tags|size|ids&drvlimit=max&formatversion=2&drvcontinue=2|Schminnte/CSD_log|20230830155437|122248551&drvlimit=6

{
    "continue": {
        "drvcontinue": "2|Schminnte/CSD_log|20230829215715|122248362",
        "continue": "||"
    },
    "query": {
        "pages": [
            {
                "ns": 2,
                "title": "User:Schminnte/CSD log",
                "missing": true,
                "deletedrevisions": [
                    {
                        "revid": 1172989799,
                        "parentid": 1172989415,
                        "user": "Schminnte",
                        "timestamp": "2023-08-30T15:54:37Z",
                        "size": 364852,
                        "tags": [
                            "mw-undo",
                            "twinkle"
                        ]
                    },
                    {
                        "revid": 1172989415,
                        "parentid": 1172877760,
                        "user": "Schminnte",
                        "timestamp": "2023-08-30T15:51:53Z",
                        "size": 640,
                        "tags": [
                            "twinkle",
                            "mw-reverted"
                        ]
                    },
                    {
                        "revid": 1172877760,
                        "parentid": 1172877712,
                        "user": "Schminnte",
                        "timestamp": "2023-08-29T21:58:13Z",
                        "size": 533,
                        "tags": [
                            "twinkle",
                            "mw-reverted"
                        ]
                    },
                    {
                        "revid": 1172877712,
                        "parentid": 1172877709,
                        "user": "Schminnte",
                        "timestamp": "2023-08-29T21:57:56Z",
                        "size": 431,
                        "tags": [
                            "mw-replace",
                            "twinkle",
                            "mw-reverted"
                        ]
                    },
                    {
                        "revid": 1172877709,
                        "parentid": 1172877670,
                        "user": "Schminnte",
                        "timestamp": "2023-08-29T21:57:47Z",
                        "size": 364852,
                        "tags": [
                            "twinkle"
                        ]
                    },
                    {
                        "revid": 1172877670,
                        "parentid": 1172877625,
                        "user": "Schminnte",
                        "timestamp": "2023-08-29T21:57:30Z",
                        "size": 364747,
                        "tags": [
                            "twinkle"
                        ]
                    }
                ]
            }
        ]
    }
}

Along with the deleted version above this collectively seems as useful as a diff would be. What's strange is that the blanking happens a full nine seconds after the previous revision. It certainly now seems like the issue is getPageText() returning undefined based on that deleted revision (or potentially the empty string in this case, but that wouldn't explain the other issue), though the theory that it was due to a simultaneous read and write, at least in this specific case, makes less sense given the reasonably substantial gap.

I guess potentially a check for ctx.pageText being undefined could be put in before running the success callback, since that should never be the case, but that seems very hacky and doesn't answer the question of why it would possibly be undefined in the first place.

I've gone through UAA looking for similar cases (not sure how I'd go about looking for similar cases for CSD logs) and found only three, including the one linked from #1534:

All of them are nearly simultaneous with the previous edit, and it's at least interesting that two of those are the same person (though to be fair they make a lot of requests at UAA). I also found this one: https://en.wikipedia.org/w/index.php?title=Wikipedia:Usernames_for_administrator_attention&diff=prev&oldid=1142378939 which seems to be a roughly similar issue happening to AV though possibly not remotely related from a technical standpoint, especially given it appears this was a test and the timing is different than the other three, but figured I'd mention it anyway.

Going back as far as the beginning of 2019, this has never happened at AIV, except once with Huggle, and with the same tight timing: https://en.wikipedia.org/w/index.php?title=Wikipedia:Administrator_intervention_against_vandalism&diff=prev&oldid=1143657909. It looks like the Twinkle AIV code sends an append request for AIV rather than getting the content, appending the report in JS, and submitting a request to replace the whole page with that content, like UAA and the CSD log do. This makes way more sense in my mind anyway, and given that it seems to work better also, it might be reasonable to just switch to that logic (assuming there's not any particular reason it's written the way it is right now). That would still leave the question of why the page text would ever be undefined open, but it's far less hacky than checking for a state that seems like it should never occur.

What's also somewhat strange is that I would expect that this sort of thing would happen with other Twinkle actions too, though very possibly it actually does, and just ends up throwing an error trying to parse undefined, leading the user of the tool to just see Twinkle freeze and refresh the page (though that's far better than blanking a page unexpectedly).

NovemLinguae commented 2 months ago

This smells like 3 different bugs to me.

This ticket kind of smells like it is querying the existing CSD log page text, that query is failing with a timeout or a 500 or something, so that variable gets set to blank or undefined, and it confuses Twinkle into thinking the page doesn't exist, so it re-writes the page as if it's the first log entry. That's probably part of a promise chain that is something like Mw.api( /* get csdWikitext, which comes back as blank or undefined due to error */ ).then( /* do stuff then overwrite csdWikitext */ );

https://en.wikipedia.org/w/index.php?title=Wikipedia:Usernames_for_administrator_attention&diff=prev&oldid=1214643632 is printing undefined whereas this ticket doesn't print undefined. So that sounds worthy of its own ticket if we think it's still happening, but might be a different bug.

https://en.wikipedia.org/w/index.php?title=Wikipedia:Usernames_for_administrator_attention&diff=prev&oldid=1142378939 is tagged with AntiVandal, not Twinkle, so is definitely unrelated.

TollensWP commented 2 months ago

Sorry, I maybe wasn't very clear – the UAA ticket already exists as #1534, and I assume it's still happening since the most recent diff in the list from my above comment is from March this year.

I'm certain it's at least possible these are the same underlying bug, I wouldn't expect this ticket to print undefined because of this in the CSD logging code: https://github.com/wikimedia-gadgets/twinkle/blob/d841a36c773a93e6c7ebec0c497963d6e1b736d3/morebits.js#L5195 which would reset the page if getPageText returned undefined or anything else falsy – in the UAA case it just concatenates the new entry onto the output of getPageText.