grrttedwards / wow-addon-updater

Python utility for updating World of Warcraft addons
GNU General Public License v3.0
71 stars 14 forks source link

Sometimes Curse is failing requests with status code 503 #68

Closed grrttedwards closed 5 years ago

grrttedwards commented 5 years ago

Seems like cloudflare is causing issues with the requests being made. The requests eventually work in the browser, but the GET calls which this utility uses (and even something like Postman) are being blocked.

Hopefully it isn't an effort to squash bots like this, otherwise I will have to look into something like https://github.com/Anorov/cloudflare-scrape

grrttedwards commented 5 years ago

It kind of looks like Curse is not following the Cloudflare best-practices in their documentation:

https://support.cloudflare.com/hc/en-us/articles/200504045-Using-Cloudflare-with-your-API

  • Browser Integrity Check: OFF Browser Integrity Check looks for common HTTP headers abused most commonly by spammers and denies access to the visitor. Since API calls made programmatically typically do not specify the same headers a web browser does, we recommend disabling the browser integrity check for the API URL pattern so that calls made outside of a browser are not blocked.

emphasis mine

grrttedwards commented 5 years ago

70 For now

jaredm4 commented 5 years ago

Can we add headers to the api calls to make it more appear more like a browser? Or revert the change that allows async addon downloading? I know your forked version is fast, but maybe that also makes it look like a bot. If it updates one by one, might look more like a browser to cloudflare?

grrttedwards commented 5 years ago

The way they are blocking the requests is that cloudflare is checking if you are capable of executing JavaScript, which only something like cfscrape will be able to handle. Even if you send requests in Postman (probably one of the most ubiquitous tools for playing with HTTP APIs) it rejects the requests.

I’ll experiment with single-threading the requests, but I don’t think that will do the trick :(

On Tue, Sep 24, 2019 at 3:35 PM Jared Markell notifications@github.com wrote:

Can we add headers to the api calls to make it more appear more like a browser? Or revert the change that allows async addon downloading? I know your forked version is fast, but maybe that also makes it look like a bot. If it updates one by one, might look more like a browser to cloudflare?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/grrttedwards/wow-addon-updater/issues/68?email_source=notifications&email_token=ABJTQ2HNHTWZFPGJKMCSBYTQLJTZHA5CNFSM4IY6Y5C2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7PRW7A#issuecomment-534715260, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJTQ2AMUAZZGSD6M7WKHFLQLJTZHANCNFSM4IY6Y5CQ .

lambroisie commented 5 years ago

Not sure if it will bring about any changes, but I posted a suggestion on Twitch's site: https://twitch.uservoice.com/forums/915910-game-mods-curseforge/suggestions/38671120-503-errors

grrttedwards commented 5 years ago

@lambroisie Hey! Wow that's awesome. Thank you. Let's keep an eye on that.

jaredm4 commented 5 years ago

@lambroisie appreciate the effort but I fear Curse has zero motivation to help users like us. They want users using the Twitch client. If we want them to fix it, we'd need to find a different reason. And WoW itself doesn't support Linux so that won't get us far, sadly.

This is a bummer since using wowinterface or github directly just aren't solutions. Versions on wowinterface may be old or nonexistent, and github often requires additional work to make it a usable addon (like in the case of Raider.io addon).

lambroisie commented 5 years ago

@jaredm4 is there a way someone can use, say, Lutris to install Twitch Client into its own prefix and then copy over its program files to the prefix that WoW is in? Obviously one would have to change the necessary settings in Lutris first - like executable location and prefix location.

I've been able to install Twitch via Lutris before, but it couldn't detect where I had WoW installed - even when I pointed it to the program in the C:\ drive (when in the same prefix) or Z:\ drive (when outside the prefix, in the greater Linux filesystem).

guppy42 commented 5 years ago

The way they are blocking the requests is that cloudflare is checking if you are capable of executing JavaScript, which only something like cfscrape will be able to handle. Even if you send requests in Postman (probably one of the most ubiquitous tools for playing with HTTP APIs) it rejects the requests.

One possible way around this is to use chrome headless mode (--headless :available since v59), I'm sure there is a python lib for interacting with it.

That does require the user to have installed chrome ofc, but it's not the worst requirement - after all you dont have to use it for personal browsing.

jpleau commented 5 years ago

Using this with curse makes it work again for me (I didn't change anything not curse-related, I don't use other sites). Just a quick fix if you want to get up and running quickly. Works with multithread as well.

diff --git a/updater/manager/addon_manager.py b/updater/manager/addon_manager.py
index c4c68b7..6b7ae09 100644
--- a/updater/manager/addon_manager.py
+++ b/updater/manager/addon_manager.py
@@ -13,6 +13,7 @@ from updater.site import site_handler, github
 from updater.site.abstract_site import SiteError, AbstractSite
 from updater.site.enum import GameVersion

+import cfscrape

 def error(message: str):
     print(message)
@@ -23,6 +24,7 @@ class AddonManager:
     _UNAVAILABLE = 'Unavailable'

     def __init__(self, config_file):
+        self.scraper = cfscrape.create_scraper()
         self.manifest = []

         # Read config file
@@ -113,7 +115,7 @@ class AddonManager:
         self.manifest.append(addon_entry)

     def get_addon_zip(self, zip_url):
-        r = requests.get(zip_url, stream=True)
+        r = self.scraper.get(zip_url, stream=True)
         r.raise_for_status()  # Raise an exception for HTTP errors
         return zipfile.ZipFile(BytesIO(r.content))

diff --git a/updater/site/abstract_site.py b/updater/site/abstract_site.py
index 0dd8670..cd2f2c6 100644
--- a/updater/site/abstract_site.py
+++ b/updater/site/abstract_site.py
@@ -2,6 +2,7 @@ from abc import ABC, abstractmethod

 from updater.site.enum import GameVersion

+import cfscrape

 class SiteError(Exception):
     pass
@@ -11,6 +12,7 @@ class AbstractSite(ABC):
     def __init__(self, url: str, game_version: GameVersion):
         self.url = url
         self.game_version = game_version
+        self.scraper = cfscrape.create_scraper()

     @classmethod
     def handles(cls, url: str) -> bool:
diff --git a/updater/site/curse.py b/updater/site/curse.py
index 456775e..bc0bcb2 100644
--- a/updater/site/curse.py
+++ b/updater/site/curse.py
@@ -21,7 +21,7 @@ class Curse(AbstractSite):

     def find_zip_url(self):
         try:
-            page = requests.get(self.url)
+            page = self.scraper.get(self.url)
             page.raise_for_status()  # Raise an exception for HTTP errors
             content_string = str(page.content)
             main_zip_url, *classic_zip_url = re.findall(
@@ -35,7 +35,7 @@ class Curse(AbstractSite):

     def get_latest_version(self):
         try:
-            page = requests.get(self.url)
+            page = self.scraper.get(self.url)
             if page.status_code in [403, 503]:
                 print("Curse is temporarily blocking requests because it thinks you are a bot... please try later. "
                       "Consider finding this addon on WoWInterface or GitHub.")
grrttedwards commented 5 years ago

@jpleau Thanks for posting. I did try this out myself, but the thought of requiring Node is definitely an icky one... For anyone who wants to work around Curse's issues right now, this code should help! You just need to go and install Node as well.

This also introduces a pretty uncomfortable amount of arbitrary code execution, since it runs whatever funky code Cloudflare decides to use to perform the check. It's running javascript that is not sandboxed by your browser, and can potentially touch your system in some way. That's why I say "icky"

Ghosthree3 commented 5 years ago

The above worked for me after running 'pipenv install cfscrape' in the root directory (for anyone patching it in and wondering why the error).

lambroisie commented 5 years ago

@grrttedwards not sure if this runs contrary to what you said about arbitrary code execution, and running the Cloudflare javascript code:

This allows the script to easily impersonate a regular web browser without explicitly deobfuscating and parsing Cloudflare's JavaScript.

https://pypi.org/project/cfscrape/

grrttedwards commented 5 years ago

At least once, the Javascript challenge needs to be evaluated. I did look a little bit into this, and that statement seems to contradict itself by the sentence before it:

Due to Cloudflare continually changing and hardening their protection page, cloudflare-scrape requires Node.js to solve JavaScript challenges. This allows [...]

I think what they were going for is "the javascript challenge needs to be evaluated once, then the session details can be re-used to easily impersonate [...]".

I looked into the code, and was pleased to see the following comment by the author, but it's a bit of a false-positive:

# Use vm.runInNewContext to safely evaluate code # The sandboxed code cannot use the Node.js standard library

from: https://github.com/Anorov/cloudflare-scrape/blob/e4f31ed0fbc8aa5ba9ee8a75cf739d08575f1451/cfscrape/__init__.py#L256

This is not untrue, but the documentation for the vm module in Node says:

The vm module is not a security mechanism. Do not use it to run untrusted code. The term "sandbox" is used throughout these docs simply to refer to a separate context, and does not confer any security guarantees.

All this being said, I don't think that Cloudflare would be sending malicious Javascript to people, but that's still something I'd like to continue to think about.

If people really want it, I can try to officially integrate cfscrape in, but like I said it has the big security implications above, and it goes without being said that it requires that you put Node on your system.

lambroisie commented 5 years ago

You know more about the security implications than I do. I have - perhaps - a misplaced confidence both that Cloudflare is unlikely to be compromised themselves, nor that they would purposefully serve up malicious Javascript code. Also I have the Linux mitigations applied for predictive branching with Intel chips. Are there other remote vulnerabilities or privilege escalation techniques? Most assuredly. I guess I'm just taking my chances, estimating the probability a low one. I wouldn't suppose to tell you what to do with implementing cfscrape or not. The possible options - as I see it - are either not integrating it, integrating it by default, or allowing a user to optionally enable it, provided they read the documentation and know to do so.

lambroisie commented 5 years ago

Oh, a friend asked me if it would be a tad more secure if we could just run the cfscrape inside bubblewrap or nsjail. I have no idea whether that's possible or not, but on face value it seemed like something I'd want to ask about here. https://google.github.io/nsjail/ https://github.com/containers/bubblewrap

grrttedwards commented 5 years ago

If everyone wouldn't mind taking a look at https://github.com/grrttedwards/wow-addon-updater/pull/73

This brings in cfscrape, which busts the Cloudflare protection (nice job, Curse... it was really hard to circumvent your scripts :eyeroll: )

Note that this REQUIRES Node installed. Please try it out and give me some feedback so I can get this in.

grrttedwards commented 5 years ago

Merged #73 and released v1.2.0