linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
2.01k stars 156 forks source link

Make Whisper Requirement more flexible to be able to use a specific Whisper version (as some breakages were introducted in 20230306) #48

Closed kamranjon closed 1 year ago

kamranjon commented 1 year ago

The latest changes to whisper (adding word-level timestamps) have added a dynamic requirement (triton) that breaks if you don't have a specific environment. If we could change the requirements.txt in whisper-timestamped to target a specific whisper version (preferably before these latest whisper changes, maybe the last stable release on jan 24?) - that would be much more stable and would not result in breakages.

Jeronymous commented 1 year ago

Thanks for notifying. Indeed targeting a specific whisper version would be more future-proof. However, using whisper version 20230306 would be better than the previous version 20230124, because of some bug resolutions that are useful (avoid an infinite loop for instance).

I'm currently testing and for now I don't see any breakage with the new version 20230306.

Can you please clarify what you refer to with "breaks if you don't have a specific environment"? Any specific error message during the setup?

kamranjon commented 1 year ago

@Jeronymous this is the exact error I get:

ERROR: Could not find a version that satisfies the requirement triton>=2.0.0.dev20221202 
(from openai-whisper) (from versions: none)
ERROR: No matching distribution found for triton>=2.0.0.dev20221202

Even just pinning whisper to git+https://github.com/openai/whisper@3e1780f - the commit right before this change would be great.

kamranjon commented 1 year ago

If you are testing on osx you will not see this error, it is a result of this portion of code that was added:

requirements = []
if sys.platform.startswith("linux"):
    triton_requirement = "triton>=2.0.0.dev20221202"
    try:
        import re
        import subprocess
        version_line = subprocess.check_output(["nvcc", "--version"]).strip().split(b"\n")[-1]
        major, minor = re.findall(rb"([\d]+)\.([\d]+)", version_line)[0]
        if (int(major), int(minor)) < (11, 4):
            # the last version supporting CUDA < 11.4
            triton_requirement = "triton==2.0.0.dev20221011"
    except (IndexError, OSError, subprocess.SubprocessError):
        pass
    requirements.append(triton_requirement)

I'm a bit surprised that this made it in to main, I wonder if they removed this version? They are now on the official 2.0.0 version as of 4 days ago: https://github.com/openai/triton/releases/tag/v2.0.0 - I think maybe they just forgot to update the dependency before pushing?

kamranjon commented 1 year ago

And for what it's worth i think this is very platform specific - so if you are using an arm image of linux - pip will not be able to find triton as a pip dependency - so this fails. Based on this discussion. From what I can tell, pip does not appear to show versions that are unavailable for an specific platform. -- Versions of Triton past 0.3.0 only have manylinux2014 x86_64 wheels on PyPI

Jeronymous commented 1 year ago

Even just pinning whisper to git+https://github.com/openai/whisper@3e1780f - the commit right before this change would be great.

This would make sense specifically for ARM linux, before the bug is solved on Whisper side.

I see that you reported the issue well https://github.com/openai/whisper/discussions/1048 Let's hope that it gets solved soon.

For now, I see nothing preventing from installing whatever version of whisper works for you, before installing whisper-timestamped.

kamranjon commented 1 year ago

@Jeronymous thank you for making the requirements.txt more flexible! This makes it easy to specify a whisper version that works with our environment.

Jeronymous commented 1 year ago

OK so in the end: