pytube / pytube

A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.
https://pytube.io
The Unlicense
12.31k stars 2.52k forks source link

pytube.exceptions.RegexMatchError: get_transform_object: could not find match for var for={(.*?)}; #1728

Open HajoBrandt opened 1 year ago

HajoBrandt commented 1 year ago

While downloading a video using the PyTube library using this code: yt.streams.get_highest_resolution().download("PATH", f"PATH.mp4") I get an error like this:

raise RegexMatchError(caller="get_transform_object", pattern=pattern)pytube.exceptions.RegexMatchError: get_transform_object: could not find match for var for={(.*?)};

I've seen a lot of fixes on stackoverflow and in the git repo of Pytube, but they seem to go into different parts of cypher.py. I would like to know how I could alternate get_transform_object class in cypher.py to match the regex check.

github-actions[bot] commented 1 year ago

Thank you for contributing to PyTube. Please remember to reference Contributing.md

kar3399 commented 1 year ago

I'm getting the same error.

untilhamza commented 1 year ago

I getting the same error too!

ejrtks1020 commented 1 year ago

I getting the same error

giulianoaccorsi commented 1 year ago

I'm getting the same error too

ricardofriba commented 1 year ago

same here

seamcarving commented 1 year ago

I'm getting the same error

yiguid commented 1 year ago

same here...

DavidOsparks commented 1 year ago

Same here +1

ethanjoby commented 1 year ago

same here

hp561 commented 1 year ago

Sammmeeee! I hope we can get this fixed 🙏🏽

luzhonghao1989 commented 1 year ago

same here

untilhamza commented 1 year ago

Here is quick fix in the meantime

https://stackoverflow.com/a/76718022/13889098

-> in file .venv/lib/python3.10/site-packages/pytube/cipher.py -> find the method get_transform_object

def get_transform_object(js: str, var: str) -> List[str]:
    pattern = r"var %s={(.*?)};" % re.escape(var)
    logger.debug("getting transform object")
    regex = re.compile(pattern, flags=re.DOTALL)
    transform_match = regex.search(js)

    if not transform_match:
        # i commented out the line raising the error
        # raise RegexMatchError(caller="get_transform_object", pattern=pattern)
        return []  # Return an empty list if no match is found

    return transform_match.group(1).replace("\n", " ").split(", ")
awmie commented 1 year ago

Here is quick fix in the meantime

https://stackoverflow.com/a/76718022/13889098

-> in file .venv/lib/python3.10/site-packages/pytube/cipher.py -> find the method get_transform_object

def get_transform_object(js: str, var: str) -> List[str]:
    pattern = r"var %s={(.*?)};" % re.escape(var)
    logger.debug("getting transform object")
    regex = re.compile(pattern, flags=re.DOTALL)
    transform_match = regex.search(js)

    if not transform_match:
        # i commented out the line raising the error
        # raise RegexMatchError(caller="get_transform_object", pattern=pattern)
        logger.error(f"No match found for pattern: {pattern}")
        return []  # Return an empty list if no match is found

    return transform_match.group(1).replace("\n", " ").split(", ")

working!! 👍

luzhonghao1989 commented 1 year ago

working!! 👍

Anto5040 commented 1 year ago

Working with the bug fix!

sint18 commented 1 year ago

working now!! Thanks.

ricardofriba commented 1 year ago

now i have this:

'NoneType' object has no attribute 'group'

ricardofriba commented 1 year ago

after upgrade pytube and apply this fix, is working now

thanks guys

NannoSilver commented 1 year ago

Working, but the error message still is there for me.

error_message_still_is_there

LutziGoz commented 1 year ago

Working, but the error message still is there for me.

error_message_still_is_there

if you want to avoid from the raise error logger exception, just remove the next line(209): logger.error(f"No match found for pattern: {pattern}")

YuriiMaiboroda commented 1 year ago

In the get_transform_plan function need to add the following restriction to the beginning of the template: (?:^|[^\w$])

pattern = r"(?:^|[^\w$])%s=function\(\w\){[a-z=\.\(\"\)]*;(.*);(?:.+)}" % name

It is also necessary to correct the templates in get_initial_function_name:

def get_initial_function_name(js: str) -> str:
    """Extract the name of the function responsible for computing the signature.
    :param str js:
        The contents of the base.js asset file.
    :rtype: str
    :returns:
        Function name from regex match
    """

    function_patterns = [
        r"[a-zA-Z0-9$]+\s*&&\s*[a-zA-Z0-9$]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(",  # noqa: E501
        r'(?P<sig>[a-zA-Z0-9\$]+)\s*=\s*function\(\s*(?P<arg>\w+)\s*\)\s*{\s*(?P=arg)\s*=\s*(?P=arg)\.split\(\s*""\s*\)',  # noqa: E501
        r'(?P<quotes>["\'])signature(?P=quotes)\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
        r"\.sig\s*\|\|\s*(?P<sig>[a-zA-Z0-9$]+)\s*\(",
        r"yt\.akamaized\.net/\)\s*\|\|\s*.*?\s*[a-zA-Z0-9$]+\s*&&\s*[a_zA-Z0-9$]+\.set\([^,]+\s*,\s*(?:encodeURIComponent\s*\()?\s*(?P<sig>[a-zA-Z0-9$]+)\(",  # noqa: E501
        r"[a-zA-Z0-9$]+\s*&&\s*[a-zA-Z0-9$]+\.set\([^,]+\s*,\s*(?:\([^)]*\)\s*\(\s*)?(?P<sig>[a-zA-Z0-9$]+)\(",  # noqa: E501
    ]
    logger.debug("finding initial function name")
    for pattern in function_patterns:
        regex = re.compile(pattern)
        function_match = regex.search(js)
        if function_match:
            logger.debug("finished regex search, matched: %s", pattern)
            return function_match.group("sig")

    raise RegexMatchError(
        caller="get_initial_function_name", pattern="multiple"
edreams commented 1 year ago

there is a bug in the latest version of PyTube that causes the get_transform_object() function to raise a RegexMatchError exception when it cannot find a match for the var parameter in the JavaScript code. In the meantime, if you are experiencing this error, you can try the following workaround:

1.-Open the file pytube/cipher.py in your favorite text editor. 2.-Find the function get_transform_object(). 3.-Comment out the line that raises the RegexMatchError exception. 4.- Save the file. Once you have done this, you should be able to run your code without any errors.( only this appear No match found for pattern: var for={(.*?)}; but the code work, at least for now) vim langchainenv/lib/python3.9/site-packages/pytube/cipher.py Here is the code with the line that raises the RegexMatchError exception commented out:

def get_transform_object(js: str, var: str) -> List[str]:
    """Extract the "transform object"."
    pattern = r"var %s={(.*?)};" % re.escape(var)
    logger.debug("getting transform object")
    regex = re.compile(pattern, flags=re.DOTALL)
    transform_match = regex.search(js)
    if not transform_match:
        # This line raises the RegexMatchError exception
        #raise RegexMatchError(caller="get_transform_object", pattern=pattern)
        logger.error(f"No match found for pattern: {pattern}")
        return []

    return transform_match.group(1).replace("\n", " ").split(", ")
bobpeulen commented 1 year ago

Same error. Fixed helped!