ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.41k stars 10.04k forks source link

overwritten class _request is not providing expected attributes (commit: 648dc53) #32573

Closed ReenigneArcher closed 9 months ago

ReenigneArcher commented 1 year ago

Checklist

Question

WRITE QUESTION HERE

Starting with commit https://github.com/ytdl-org/youtube-dl/commit/648dc5304cb2476592ff142988b8c62675011fcc I am unable to use youtube-dl in my Plex plugin.

The traceback I get in the Plex plugin logs is not all that useful.

2023-09-28 02:27:26,494 (7f9dd1ce3808) :  CRITICAL (core:615) - Exception starting plug-in (most recent call last):
  File "/usr/lib/plexmediaserver/Resources/Plug-ins-1cf77d501/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/core.py", line 608, in start
    self.sandbox.execute(self.init_code)
  File "/usr/lib/plexmediaserver/Resources/Plug-ins-1cf77d501/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/code/sandbox.py", line 256, in execute
    exec(code) in self.environment
  File "/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Plug-ins/Themerr-plex.bundle/Contents/Code/__init__.py", line 192, in <module>
    class Themerr(Agent.Movies):
  File "/usr/lib/plexmediaserver/Resources/Plug-ins-1cf77d501/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/api/agentkit.py", line 1133, in __new__
    AgentKit._register_agent_class(cls)
  File "/usr/lib/plexmediaserver/Resources/Plug-ins-1cf77d501/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/api/agentkit.py", line 1116, in _register_agent_class
    cls._shared_instance._push_agent_info()
  File "/usr/lib/plexmediaserver/Resources/Plug-ins-1cf77d501/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/api/agentkit.py", line 944, in _push_agent_info
    agent_info = agents
  File "/usr/lib/plexmediaserver/Resources/Plug-ins-1cf77d501/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/components/messaging.py", line 86, in call_external_function
    packed_result = self._core.networking.http_request(url, cacheTime=0, timeout=None, immediate=True).content
  File "/usr/lib/plexmediaserver/Resources/Plug-ins-1cf77d501/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/components/networking.py", line 352, in http_request
    return HTTPRequest(self._core, url, data, h, url_cache, encoding, errors, timeout, immediate, sleep, opener, follow_redirects, method)
  File "/usr/lib/plexmediaserver/Resources/Plug-ins-1cf77d501/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/components/networking.py", line 119, in __init__
    self.load()
  File "/usr/lib/plexmediaserver/Resources/Plug-ins-1cf77d501/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/components/networking.py", line 159, in load
    f = self._opener.open(req, timeout=self._timeout)
  File "/usr/lib/plexmediaserver/Resources/Python/python27.zip/urllib2.py", line 421, in open
    protocol = req.get_type()
  File "/usr/lib/plexmediaserver/Resources/Python/python27.zip/urllib2.py", line 280, in get_type
    if self.type is None:
  File "/usr/lib/plexmediaserver/Resources/Python/python27.zip/urllib2.py", line 254, in __getattr__
    raise AttributeError, attr
AttributeError: type

Given the code from the commit:

try:
    _req = compat_urllib_request.Request
    _req('http://127.0.0.1', method='GET')
except TypeError:
    class _request(object):
        def __new__(cls, url, *args, **kwargs):
            method = kwargs.pop('method', None)
            r = _req(url, *args, **kwargs)
            if method:
                r.get_method = types.MethodType(lambda _: method, r)
            return r

    compat_urllib_request.Request = _request

I'm assuming that the TypeError path will be taken in Python 2.7? And given that the overwritten _request object has no Attribute type which urllib2 seems to be expecting.

This is the last piece of code from the Plex framework before the builtins are called: https://github.com/squaresmile/Plex-Plug-Ins/blob/fc4ab34d4cb995668abd84b304b57c5bf13cb69d/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/components/networking.py#L159

It's difficult to follow the framework code, but I believe this is function from the Plex framework being called by the line in the traceback. https://github.com/squaresmile/Plex-Plug-Ins/blob/fc4ab34d4cb995668abd84b304b57c5bf13cb69d/Framework.bundle/Contents/Resources/Versions/0/Python/PMS/HTTP.py#L119-L148

I'm not 100% sure, but I'm thinking this could be considered a bug in youtube-dl IF it's overwriting built-in types.

dirkf commented 1 year ago

urllib2.Request doesn't have an interface for setting the request method. For compat purposes it needs one that matches Python 3 urllib.request.Request. But the simple-minded code that does this might be falling foul of this situation as described at https://github.com/python/cpython/blob/2.7/Lib/urllib2.py#L247:

       # XXX this is a fallback mechanism to guard against these
       # methods getting called in a non-standard order.  this may be
       # too complicated and/or unnecessary.

(apparently it is necessary in the case above).

The quoted compat code wraps the Request old-style class with a new-style class whose instances are instances of the replaced class, with a specialised get_method() if the method keyword was set in the constructor call (overriding the one at l.256 of urllib2.py).

This is the context in urllib2.py:

    def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
        # accept a URL or a Request object
        if isinstance(fullurl, basestring):
            req = Request(fullurl, data)
        else:
            req = fullurl
            if data is not None:
                req.add_data(data)

        req.timeout = timeout
        protocol = req.get_type()  # failing line
        ...

As expected, it calls the get_type() of the original class:

    def get_type(self):
        if self.type is None:
            self.type, self.__r_type = splittype(self.__original)
            if self.type is None:
                raise ValueError, "unknown url type: %s" % self.__original
        return self.type

Presumably the first line of the method is being resolved wrongly. The results of the commands p self, type(self) and dir(self) at a debugger break on the marked failing line would be useful. If self was constructed with Request(url, ...) it should have a type of None either from the __init__() of the original Request class being called directly, or from its invocation in the compat line r = _req(url, ...) when constructing an instance of the the wrapper class. Similarly, the __original attribute should have been set, but the line that uses it is never reached.

dirkf commented 1 year ago

A similar hack is already done at https://github.com/squaresmile/Plex-Plug-Ins/blob/fc4ab34d4cb995668abd84b304b57c5bf13cb69d/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/components/networking.py#L157, but this shouldn't be a problem. The Plex code appears to go straight into the original urllib2.Request() code without touching the yt-dl compat.

ReenigneArcher commented 1 year ago

The results of the commands p self, type(self) and dir(self) at a debugger break on the marked failing line would be useful

I'm not sure if this is possible as it's inside the Plex framework. As far as I know it's not really possible to run this on it's own outside of a Plex environment.

The Plex code appears to go straight into the original urllib2.Request() code without touching the yt-dl compat.

I'd guess that changes when I import youtube_dl?

https://github.com/LizardByte/Themerr-plex/blob/772499efdb7fedc14302d64151b35fd29e4c8e32/Contents/Code/youtube_dl_helper.py#L14

from: https://github.com/LizardByte/Themerr-plex/blob/772499efdb7fedc14302d64151b35fd29e4c8e32/Contents/Code/plex_api_helper.py#L34

from: https://github.com/LizardByte/Themerr-plex/blob/772499efdb7fedc14302d64151b35fd29e4c8e32/Contents/Code/__init__.py#L30

I'm not really that familiar with the source of youtube-dl or the compat layer... but I would be happy to test any patch/branch that may address this.

dirkf commented 1 year ago

Neither the code nor the backtrace from Plex shows any sign of calling yt-dl compat. As in another case recently the order of importing may be relevant.

Try adding this missing attribute hook to the _request class in compat.py:

        def __getattr__(self, name):
            if hasattr(_req, name):
                return _req.name
            return super(_request, self).__getattr__(name)

In case the lookup of attributes is not working, this should make the instance value default to the class value in the original urllib2.Request.

ReenigneArcher commented 1 year ago

As in another case recently the order of importing may be relevant.

I did remove the install_aliases() method from my backported library so those conflicts should be resolved from #32544

Adding this __getattr__ method to the class produces the same traceback. It seems like it should work to me... I did confirm that commenting out the try/except block (even on the latest commit) fixes the issue though.

dirkf commented 1 year ago

So there's something weirder going on. It looked familiar, and then I remembered utils.py l.2729

       if sys.version_info < (2, 7):
           # avoid possible race where __r_type may be unset
           req.get_type()

(I never found out why that worked, not having a Py2.6 at hand)

So try this:

         def __new__(cls, url, *args, **kwargs):
             method = kwargs.pop('method', None)
             r = _req(url, *args, **kwargs)
+            r.get_type()
             if method:
                 r.get_method = types.MethodType(lambda _: method, r)
             return r
ReenigneArcher commented 1 year ago

Unfortunately, I get the same result.

dirkf commented 1 year ago

If you can't connect to the server where the Plex framework is running to use pdb, maybe you can hack some print() statements into the code and upload that in place of the production code?

You could also try patching the failing line in urllib2:

     def get_type(self):
-        if self.type is None:
+        if getattr(self, 'type', None) is None
             self.type, self.__r_type = splittype(self.__original)
             if self.type is None:
                 raise ValueError, "unknown url type: %s" % self.__original
         return self.type

This would have been safer in any case, even if it's difficult to see how get_type() could possibly be called before __init__() has completed.

ReenigneArcher commented 1 year ago

maybe you can hack some print() statements into the code and upload that in place of the production code?

I guess I could, but I have no idea where it would actually print to. There's no console or anything like that... I guess I could make it write some output to a specific file instead.

I will try to get some additional output from the framework. I'm a bit confused as to what is wanted and where though.

After this line? https://github.com/squaresmile/Plex-Plug-Ins/blob/fc4ab34d4cb995668abd84b304b57c5bf13cb69d/Framework.bundle/Contents/Resources/Versions/0/Python/PMS/HTTP.py#L138

print(type(request))
print(dir(request))
dirkf commented 1 year ago

I was thinking, just before https://github.com/python/cpython/blob/8d21aa21f2cbc6d50aab3f420bb23be1d081dac4/Lib/urllib2.py#L421. At that point the variable is req rather than request; also print(req). The backtrace doesn't show HTTP.py. So actually it's the Py2.7 urllib2 that needs to be hacked: I suppose you have authority to do that. Maybe save a copy of the urllib2.py on the server and edit the original file with either the patch above or the diagnostics?

What the BT does show is:

  1. execute(self.init_code) in sandbox.py:256 launches the plugin
  2. during this, cls._shared_instance._push_agent_info() in agentkit.py seems to be getting some information by HTTP packed_result = self._core.networking.http_request(url, cacheTime=0, timeout=None, immediate=True).content; no explicit opener is specified, so a default created by urllib2.build_opener()` is used
  3. immediate was set, so self.load() is called and this calls the open() of the opener
  4. this calls OpenerDirector.open() from urllib2, where the error occurs on calling get_type() on its req.

As some background, urllib2 "opens" a URL by calling a series of functions within the "opener", each dealing with a specific aspect of the opening task (proxies, cookies, ...) including one (that fails here) to open the network connection itself.

ReenigneArcher commented 1 year ago

Okay, I haven't got to this yet as it's not so easy to implement in my CI system.

I looked into how plugins are loaded a little bit more, and it's quite interesting. They are loaded using RestrictedPython and xpython. I can't find much info about xptyhon but I think it's this https://pypi.org/project/x-python/

I wonder if this has something to do with the errors. https://github.com/squaresmile/Plex-Plug-Ins/blob/master/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/code/loader.py

dirkf commented 1 year ago

xpython seems irrelevant: https://github.com/squaresmile/Plex-Plug-Ins/blob/master/Framework.bundle/Contents/Resources/Platforms/Shared/Libraries/xpython.py

If it's using RestrictedPython, the version must be out of date, since v6 (2022) unsupported Python 2.7. But this is hardly the place to worry about that.

If the sandbox policy messes with getattr that may be relevant.

https://github.com/squaresmile/Plex-Plug-Ins/blob/master/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/code/loader.py#L30: WTF?

exec(code) in self.__dict__

dirkf commented 1 year ago

@ReenigneArcher, if you could test this replacement code in compat.py and report back, I'll try to work a similar fix into the system:

--- old/youtube_dl/compat.py
+++ new/youtube_dl/compat.py

 # Also fix up lack of method arg in old Pythons
 try:
-    _req = compat_urllib_request.Request
-    _req('http://127.0.0.1', method='GET')
+    type(compat_urllib_request.Request('http://127.0.0.1', method='GET'))
 except TypeError:
-    class _request(object):
-        def __new__(cls, url, *args, **kwargs):
-            method = kwargs.pop('method', None)
-            r = _req(url, *args, **kwargs)
-            if method:
-                r.get_method = types.MethodType(lambda _: method, r)
-            return r
-
-    compat_urllib_request.Request = _request
-
+    def _add_init_method_arg(cls):
+
+        init = cls.__init__
+
+        def wrapped_init(self, *args, **kwargs):
+            method = kwargs.pop('method', 'GET')
+            init(self, *args, **kwargs)
+            if self.has_data() and method == 'GET':
+                method = 'POST'
+            self.get_method = types.MethodType(lambda _: method, self)
+
+        cls.__init__ = wrapped_init
+
+    _add_init_method_arg(compat_urllib_request.Request)
+    del _add_init_method_arg
ReenigneArcher commented 1 year ago

@dirkf this change works!

dirkf commented 1 year ago

Great, I'll put it through full testing.

dirkf commented 10 months ago

PR #32695 contains a similar fix, as in the above commit, and either could be tested.

ReenigneArcher commented 10 months ago

This is not easy for me to test since it exists in a different repo and I am now using this as a submodule (due to no PyPi package) in my project... if you could make a branch in this repo, I could test it fairly easily.

dirkf commented 10 months ago

Maybe just apply commit 0c75367?

ReenigneArcher commented 9 months ago

@dirkf it looks like applying this as a patch works fine. I didn't change the commit of the submodule, but I can if needed. Currently it's on be008e657d79832642e2158557c899249c9e31cd

https://github.com/LizardByte/Themerr-plex/pull/336/files

dirkf commented 9 months ago

Fixed in #32695: thanks for the test.