lay295 / TwitchDownloader

Twitch VOD/Clip Downloader - Chat Download/Render/Replay
MIT License
2.66k stars 260 forks source link

Chat Download broken #704

Open Ocel8 opened 1 year ago

Ocel8 commented 1 year ago

Checklist

Edition

Windows GUI Application

Describe your issue here

I think the chat download feature is broken, it gets stuck at 4%, I've tried different vod. Maybe Twitch changed the API?

editheus commented 1 year ago

same here

SlimSchaef commented 1 year ago

I'm having the same issue. Tried on both a VOD and clip. Gets stuck at a certain percentage and doesn't move.

lay295 commented 1 year ago

Oop, the day I thought might come finally happened. They added the integrity check to the GQL endpoint for fetching comments. There will not be a really straightforward fix for this, so I'll look into this in the next few days.

gadi-playstream-gg commented 1 year ago

any workaround / suggestions here?

Ocel8 commented 1 year ago

any workaround / suggestions here?

Not much of a solution, but live chat is still working (chatterino), so maybe we can directly record the chat during the livestream.

ScrubN commented 1 year ago

Oop, the day I thought might come finally happened. They added the integrity check to the GQL endpoint for fetching comments. There will not be a really straightforward fix for this, so I'll look into this in the next few days.

Looks like they added integrity checks to more than just comments. Here is the response from GetGqlClips

{
  "errors": [
    {
      "message": "failed integrity check",
      "path": [
        "user",
        "clips"
      ]
    }
  ],
  "data": {
    "user": {}
  },
  "extensions": {
    "challenge": {
      "type": "integrity"
    },
    "durationMilliseconds": 21,
    "requestID": "0JN9WSS6EHZP"
  }
}
lay295 commented 1 year ago

Looks like they added integrity checks to more than just comments. Here is the response from GetGqlClips

Yeah, seems to be any GQL query involving a cursor of some kind. So, I can still download all the chat messages (for now) if I just don't use the cursor which kinda sucks.

Hopefully can find a better solution on the weekend, but uh, I imagine it would involve me hosting my own server in the middle and proxying the requests, or somehow involving Selenium but I believe Kasada detects that so not sure if that would work.

lay295 commented 1 year ago

any workaround / suggestions here?

You can give 1.52.6 a try for now. It's set to be a pre-release so people will not auto-update to it. If there are no glaring issues I'll mark it as a full release, however it is just a temporary solution.

lay295 commented 1 year ago

image

Well Kasada detects Selenium as a bad bot in the JWT which isn't surprising, but the GQL request still works huh... so I guess I could just generate an integrity token even while being identified and it would work, at least for now.

EDIT: nevermind, looked at the wrong query...

nurupo commented 1 year ago

1.52.6 chat downloading gets stuck at 99% for me.

Video id: 1811306223, Format: Text, Timestamp: Relative, Connections:1.

SlimSchaef commented 1 year ago

i just downloaded a JSON file for VOD on 1.52.6 and it worked for me

lay295 commented 1 year ago

1.52.6 chat downloading gets stuck at 99% for me.

Video id: 1811306223, Format: Text, Timestamp: Relative, Connections:1.

Yeah I guess that's what I get for only testing on popular streamers... it's because we're not directly iterating on cursors anymore I was scared to miss comments so I kinda check every second :)

Maybe it's safe to say if there are no comments returned I can just assume there is no chat left?

ScrubN commented 1 year ago

1.52.6 chat downloading gets stuck at 99% for me. Video id: 1811306223, Format: Text, Timestamp: Relative, Connections:1.

Yeah I guess that's what I get for only testing on popular streamers... it's because we're not directly iterating on cursors anymore I was scared to miss comments so I kinda check every second :)

Maybe it's safe to say if there are no comments returned I can just assume there is no chat left?

Comparing the vod chat to the API response, there are still comments after the second where it gets stuck. The issue is that every loop latestMessage gets reset. I have found a workaround though. d69a170985e67231cfd73ea7e3ef1255a4fcdf12 image

editheus commented 1 year ago

for me 1.52.6 chat downloading gets stuck at 100%.

gets stuck with the message backfilling commenter info

ScrubN commented 1 year ago

for me 1.52.6 chat downloading gets stuck at 100%.

gets stuck with the message backfilling commenter info

Please provide the vod id.

SlimSchaef commented 1 year ago

i just downloaded a JSON file for VOD on 1.52.6 and it worked for me

my VOD id is 1809529254

editheus commented 1 year ago

for me 1.52.6 chat downloading gets stuck at 100%. gets stuck with the message backfilling commenter info

Please provide the vod id.

my ID is 1808145987

ScrubN commented 1 year ago

for me 1.52.6 chat downloading gets stuck at 100%. gets stuck with the message backfilling commenter info

Please provide the vod id.

my ID is 1808145987

I was able to successfully download the chat, backfilled commenter info included.

Ashdemai commented 1 year ago

for me 1.52.6 chat downloading gets stuck at 100%. gets stuck with the message backfilling commenter info

Please provide the vod id.

my ID is 1808145987

I was able to download the chat, albeit with one weird quirk: from 1-89 it was all good speedy, but from 90-100 it was real slow. It still finished to completion though

lay295 commented 1 year ago

@ScrubN can we change it back to 1 second instead of 5? When I download a high frequency chat such as xqc there are considerable differences in the comment count.

5s vs 1s image

I was able to download the chat, albeit with one weird quirk: from 1-89 it was all good speedy, but from 90-100 it was real slow. It still finished to completion though

Because of how we're forced to iterate over the comments currently, I don't think anything can really be done about that. I'm afraid people use this tool to archive streams and would not notice comments are missing and treat it as the source of truth.

ScrubN commented 1 year ago

@ScrubN can we change it back to 1 second instead of 5? When I download a high frequency chat such as xqc there are considerable differences in the comment count.

Ah, I didn't seem to notice that issue. Sure we can go back to 1 second.

lay295 commented 1 year ago

Apparently there is an old mobile API client id floating around that doesn't have the same restrictions. Still a temporary solution but at least not as jank...

lay295 commented 1 year ago

Just as a note for the future when this actually has to be fixed

Seems undetected chrome driver passes the Kasada bot check, even in headless mode currently https://github.com/fysh711426/UndetectedChromeDriver

image

So when we do need an integrity token, we could just launch chrome in headless mode and just monitor the network requests for the JWT token. It would suck to add Chrome as a dependency, and having the launch a full blown browser, but cannot really think of a better solution.

Ashdemai commented 1 year ago

Just as a note for the future when this actually has to be fixed

Seems undetected chrome driver passes the Kasada bot check, even in headless mode currently https://github.com/fysh711426/UndetectedChromeDriver

image

So when we do need an integrity token, we could just launch chrome in headless mode and just monitor the network requests for the JWT token. It would suck to add Chrome as a dependency, and having the launch a full blown browser, but cannot really think of a better solution.

I hope that we don't have to resort to that. I'm not willing to install a whole browser just to run a script.

lay295 commented 1 year ago

I hope that we don't have to resort to that. I'm not willing to install a whole browser just to run a script.

Optionally I could also just allow users to pass in their own integrity token, but that would mean they'd have to fetch it themselves. I don't really see a great solution here.

I could also just host my own server that handles the integrity token and acts as a proxy between the user and GQL, I can see a few drawbacks to that though.

Ashdemai commented 1 year ago

What do you need to do to fetch an integrity token?

lay295 commented 1 year ago

What do you need to do to fetch an integrity token?

You could look at your network requests tab in a web browser and look for the request to https://gql.twitch.tv/integrity and it will be in the response body.

There are 2 headers that Kasada generates to verify that you're not a bot, and that's what Twitch uses to determine if you're a bot or not.

image

I mean, ideally someone would reverse engineer the JavaScript and we could get those JavaScript challenge answers without a real browser. But that isn't something I think I can do. In the past I've done something similar by running a JavaScript challenge through the V8 engine to get a challenge answer, but Kasada seems more advanced than what I've done in the past so.

Ashdemai commented 1 year ago

I don't code or anything, just throwing it out there if it even works at all: Does Curl allow you to fetch those request headers or look for them? Maybe then you won't have to rely on such a heavy dependency. I also know Windows users have Curl out-of-the-box.

lay295 commented 1 year ago

I don't code or anything, just throwing it out there if it even works at all: Does Curl allow you to fetch those request headers or look for them? Maybe then you won't have to rely on such a heavy dependency. I also know Windows users have Curl out-of-the-box.

What the program is doing today is essentially just curl requests, just raw HTTP requests. But to get an integrity token, we need to run javascript to generate those headers. Twitch is specifically making it as such to prevent bots (that for exmaple only use curl requests) and wants to make it harder for bot makers.

Again maybe we can figure out how to run the javascript not in a browser and generate the needed headers that way, but who knows.

virodoran commented 1 year ago

If this comes up again, maybe a simple solution would be to have a separate browser plugin that allows users to easily copy the integrity token directly out of their browser? Makes it a bit more annoying than having everything automated and bundled in one package, but it seems cleaner than shipping Chromium or another entire javascript engine.