Revert back to http connection

tleydxdy commented 4 years ago

Now that using quic will be banned too, I think we should go back to using http, as it's less dependency, and less fragile (on Alpine at least). or offer it as a build option?

unixfox commented 4 years ago

How do you know that the quic workaround is banned by Google? I do agree that it may fix #917 but if the workaround is still working why reverting to a state where almost every invidious would stop working due to the Google captcha?

haizrul commented 4 years ago

Use https://anti-captcha.com will solve the captcha problem.

tleydxdy commented 4 years ago

@unixfox because people are being banned?

unixfox commented 4 years ago

@tleydxdy I didn't experience any ban at all (my instance is yewtu.be). The only thing that I did is to block all the API endpoints except from the ones that the web interface uses. This most likely reduced the amount of requests to Google but I didn't experience any ban yet. I'm pretty sure that if Invidious wasn't using the quic workaround my instance would have been banned a long time ago (from my experience with dealing with reCaptcha on Searx).

haizrul commented 4 years ago

@tleydxdy I didn't experience any ban at all (my instance is yewtu.be). The only thing that I did is to block all the API endpoints except from the ones that the web interface uses. This most likely reduced the amount of requests to Google but I didn't experience any ban yet. I'm pretty sure that if Invidious wasn't using the quic workaround my instance would have been banned a long time ago (from my experience with dealing with reCaptcha on Searx).

Hi sir, may i know how to block all the API endpoints except for web uses like you said? I want to implement it on my instance too. Please help.

unixfox commented 4 years ago

@tleydxdy I didn't experience any ban at all (my instance is yewtu.be). The only thing that I did is to block all the API endpoints except from the ones that the web interface uses. This most likely reduced the amount of requests to Google but I didn't experience any ban yet. I'm pretty sure that if Invidious wasn't using the quic workaround my instance would have been banned a long time ago (from my experience with dealing with reCaptcha on Searx).

Hi sir, may i know how to block all the API endpoints except for web uses like you said? I want to implement it on my instance too. Please help.

I just used the status parameter of Caddy like this:

status 403 {
       /api/v1/videos
       /api/v1/channels
       /api/v1/search
       /api/v1/mixes
}

haizrul commented 4 years ago

I use Debian 10, can you advice what file should i edit?

unixfox commented 4 years ago

If you installed the Caddy webserver with this script: https://github.com/sayem314/Caddy-Web-Server-Installer Then it's in the /etc/Caddyfile

haizrul commented 4 years ago

If you installed the Caddy webserver with this script: https://github.com/sayem314/Caddy-Web-Server-Installer Then it's in the /etc/Caddyfile

Ok sir, i will try it. Thanks a lot for the help! 👍

omarroth commented 4 years ago

There are two different kinds of CAPTCHAs:

The first is similar to one of the reported errors in TeamNewPipe/NewPipe#2924, and looks like this: 72490565-84024080-37e5-11ea-9f06-d1bed1f6df2d

(For reference, the "submit" button makes a POST request to https://www.youtube.com/das_captcha, with the result of the CAPTCHA as "g-captcha-response" IIRC).

After a successful POST YouTube returns a new cookie goojf that the client can then use for subsequent requests.

The second one is more generic and looks like this:

After a successful POST (to https://www.google.com/sorry/index... you receive a GOOGLE_ABUSE_EXEMPTION cookie that is valid for around 6 hours (the cookie itself has an expires value or similar that you can use).

The goojf cookie provided by the first does not consistently prevent future captchas, and is not practical to bypass using something like anti-captcha (see #886). This captcha is completely bypassed when using QUIC. This is also why you will never see this type of CAPTCHA when using Chrome (except on first load), since all subsequent requests use QUIC.

The GOOGLE_ABUSE_EXEMPTION cookie will consistently prevent captchas from appearing until it expires. This is the captcha that is actually being bypassed when using anti-captcha.

unixfox commented 4 years ago

@omarroth Do you plan to support the cookie GOOGLE_ABUSE_EXEMPTION for anti-recaptcha? My instance is not blocked for viewing videos but for the channels. When invidious is fetching the channel info it gets the second type of block that you explained with "/sorry/index".

Thus, the automatic captcha solving doesn't work because invidious doesn't check if the instance is partially blocked. Like only for fetching the channels.

omarroth commented 4 years ago

Do you plan to support the cookie GOOGLE_ABUSE_EXEMPTION for anti-recaptcha?

This is the only cookie that is currently supported.

For clarification, what does e.g.

$ curl -sD - -o /dev/null 'https://www.youtube.com/browse_ajax?continuation=4qmFsgI8EhhVQ2EzamdoSUxCa3BiTW03bnBoeGlCcUEaIEVnWjJhV1JsYjNNd0FqZ0JZQUZxQUxnQkFDQUFlZ0V4&gl=US&hl=en'

return for you? (you may also need to specify curl -4 or curl -6).

unixfox commented 4 years ago

That's strange because the automatic anti-recaptcha never wants to activate itself. I though the anti-recaptcha was only designed for watching videos according to the source code: https://github.com/omarroth/invidious/blob/master/src/invidious/helpers/jobs.cr#L239

I'm on the phone but the curl command should returns the same second page with "our systems have detected...".

Everytime I fetch a channel I get a JSON::ParseException like described in #963

unixfox commented 4 years ago

My bad you are right @omarroth, it does indeed support the cookie GOOGLE_ABUSE_EXEMPTION. But as you can see it check only if the instance is blocked for video loading: https://github.com/omarroth/invidious/blob/master/src/invidious/helpers/jobs.cr#L239. I modified the URL to /browse_ajax?continuation=4qmFsgI8EhhVQ2EzamdoSUxCa3BiTW03bnBoeGlCcUEaIEVnWjJhV1JsYjNNd0FqZ0JZQUZxQUxnQkFDQUFlZ0V4&gl=US&hl=en and the anticaptcha worked.

Can you add that new URL in the source code or come up with a way to detect if a request that invidious does is redirected to /sorry/index then trigger the bypass_captcha function?

Perflyst commented 4 years ago

I have similar behavior but with video informations, like comments, likes etc

artths commented 4 years ago

After a successful POST (to https://www.google.com/sorry/index... you receive a GOOGLE_ABUSE_EXEMPTION cookie that is valid for around 6 hours (the cookie itself has an expires value or similar that you can use).

I'm trying to implement anti-captcha for NewPipe. Currently I receive second type of captcha - "https://www.google.com/sorry/index..." and try to make post with 3 params: "q", "continue" and "g-recaptcha-response" but never receive GOOGLE_ABUSE_EXEMPTION cookie nor any redirect url. What I do wrong?

unixfox commented 4 years ago

What's the error message given by Google? Also what's the status code when doing a request? If it's a 400 status code then there is something wrong in your code.

artths commented 4 years ago

It remains the same page with same url "https://www.google.com/sorry/index.." and 429 status code, like I didn't post at all.

unixfox commented 4 years ago

Is your request a POST request? Also is your request body converted from query strings and has a Content-Type header of application/x-www-form-urlencoded? It is also preferred to specify the referrer. Here is an example of a body made by a browser:

g-recaptcha-response: 03AERD8Xp5eQ8xX4nwTMr3_8OzfFyoU4IDcMW6ealj6gUNVsCSmB2AlZDuXtKkjIoCICyO5ZBK_mFfGKaXOjGqkHNvVkXhHmAPNCsU2FRip2hweFGYSVrgRzVRyeVKStSFM5WkLfxMXlp_2L-Liu6JCPo_LS_-0yJqA1zyAN6diQRyqEduU7qp6Lo0MhciuTj0SlAxzV2WDaIgubS_pd9x8gqfsCa6rEJ2y8tVyD-m_k1TJmcrUQlpsuRMnRfsM2BFggApYZ8TGTC5y-breO3IlnMsxKMa9-g6jt3IBVHE3BZ8mMcdTdp1A0En7_fkeZvpUM7BKTtwVu9Y4fc-9G5aeDRp6D8RseAN-rEng9S6lA_g91EhGqaaw33vZt4S0HQMbMqVeCoVCrdGtpevIUrEfjSrv7RjSUVC8WQzRmwAc4R4KDIqC_DQ_tGf5dBpY9HMihJvhP-twAdRTPWsDUDlrirpdL19bWimHg
q: EhAgAQZ8JmAEJQABAAAAAAmCGLqqo_MFIhkA8aeDS6ASm_qRFdynMgfJqm_jtxy0t4GDMgFy
continue: https://www.google.com/search?q=test

The best way to know if your request is correct or incorrect is to use a proxy like mitmproxy and compare your request with a request made in a browser.

EDIT: Here is an example code from one of my project: https://github.com/unixfox/proxy-sorry-google-recaptcha/blob/master/anticaptcha.js#L53. I hope this will help you.

artths commented 4 years ago

Yes, I do POST with okhttp3.

            FormBody.Builder formBodyBuilder = new FormBody.Builder();

            for (Map.Entry<String, String> entry : mCaptchaInputs.entrySet()) {
                formBodyBuilder.add(entry.getKey(), entry.getValue());
            }

            okhttp3.Request request = new okhttp3.Request.Builder()
                    .url(mCaptchaPostUrl)
                    .addHeader("User-Agent", USER_AGENT)
                    .addHeader("Accept-Language", "en-GB, en;q=0.9")
                    .addHeader("Content-Type", "application/x-www-form-urlencoded")
                    .addHeader("X-YouTube-Client-Name", "1")
                    .addHeader("X-YouTube-Client-Version", "2.20200214.04.00")
                    .post(formBodyBuilder.build())
                    .build();

            okhttp3.Response response = client.newCall(request).execute();

I can confirm that "q" value I post is the same as located in page.

I see omarroth closes previous connection just before POST. I parse the page, close connection, wait for the captcha task and then POST. Could it be the reason?

artths commented 4 years ago

LOL. In case anybody need this: It was auto redirect of okhttp. I was getting a cookie and a redirect to the original url. Regular browser would set the cookie and redirect you to your page, but I was getting redirect without setting a cookie, so redirected again to a new captcha page.

unixfox commented 4 years ago

I had the same issue with got, that's why I had to set methodRewriting to false here: https://github.com/unixfox/proxy-sorry-google-recaptcha/blob/master/anticaptcha.js#L68. 2020-03-11_21-49

artths commented 4 years ago

Oh. I wish I understand JS well. In any case thank you for help!

iv-org / invidious

Revert back to http connection #957