Closed pszemraj closed 1 year ago
currently, "CLI" works while python package does not, mostly because this is only implemented in the transformers dev release which is not pip:
Obtaining file:///C:/Users/peter/code-dev-22/vid2cleantxt
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [3 lines of output]
C:\Users\peter\miniconda3\envs\asr\lib\site-packages\setuptools\installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
warnings.warn(
error in vid2cleantxt setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Parse error at "'+https:/'": Expected stringEnd
[end of output]
I will post a notebook to illustrate later but this might be a draft till the next prod version of transformers is released, which, I guess is fine since most of the implementation is done now (unless it changes)
here's a notebook illustrating
ok so CPU implementation seems ok, need to double check some cuda things for GPU
text_output, metadata_output = vid2cleantxt.transcribe.transcribe_dir(
input_dir=".",
model_id="openai/whisper-small.en",
# chunk_length=30,
# above are defaults to show important args
)
metadata_output
results in errors
Loading models @ Oct-11-2022_-00-17-17 - may take some time...
if RT seems excessive, try --verbose flag or checking logfile
Downloading: 100%
185k/185k [00:00<00:00, 878kB/s]
Downloading: 100%
810/810 [00:00<00:00, 8.46kB/s]
Downloading: 100%
999k/999k [00:00<00:00, 821kB/s]
Downloading: 100%
456k/456k [00:00<00:00, 4.70MB/s]
Downloading: 100%
52.7k/52.7k [00:00<00:00, 1.23MB/s]
Downloading: 100%
2.08k/2.08k [00:00<00:00, 55.8kB/s]
Downloading: 100%
1.72k/1.72k [00:00<00:00, 56.1kB/s]
Downloading: 100%
1.78k/1.78k [00:00<00:00, 63.6kB/s]
Downloading: 100%
967M/967M [00:16<00:00, 54.5MB/s]
Downloading: 100%
436M/436M [00:34<00:00, 50.2MB/s]
WARNING:root:Failed loading NeuSpell spellchecker, reverting to basic spellchecker
WARNING:root:invalid load key, '<'.
transcribing...: 100%
1/1 [00:16<00:00, 16.63s/it]
Creating .wav audio clips: 100%
8/8 [00:00<00:00, 97.93it/s]
Transcribing video: 100%
8/8 [00:09<00:00, 1.02s/it]
ERROR:root:Error transcribing chunk president_20_kennedy_27_s_20196220_speech_20_on_20_the_20_us_20_space_20_program_2020_c_span_clipaudio_0.wav in President20Kennedy27s20196220Speech20on20the20US20Space20Program2020CSPAN20Classroom.mp4 @ Oct-11-2022_-00
ERROR:root:Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
/content/vid2cleantxt/vid2cleantxt/transcribe.py:303: UserWarning: Error transcribing chunk - see log for details
warnings.warn("Error transcribing chunk - see log for details")
ERROR:root:Error transcribing chunk president_20_kennedy_27_s_20196220_speech_20_on_20_the_20_us_20_space_20_program_2020_c_span_clipaudio_1.wav in President20Kennedy27s20196220Speech20on20the20US20Space20Program2020CSPAN20Classroom.mp4 @ Oct-11-2022_-00
ERROR:root:Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
ERROR:root:Error transcribing chunk president_20_kennedy_27_s_20196220_speech_20_on_20_the_20_us_20_space_20_program_2020_c_span_clipaudio_2.wav in President20Kennedy27s20196220Speech20on20the20US20Space20Program2020CSPAN20Classroom.mp4 @ Oct-11-2022_-00
ERROR:root:Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
ERROR:root:Error transcribing chunk president_20_kennedy_27_s_20196220_speech_20_on_20_the_20_us_20_space_20_program_2020_c_span_clipaudio_3.wav in President20Kennedy27s20196220Speech20on20the20US20Space20Program2020CSPAN20Classroom.mp4 @ Oct-11-2022_-00
ERROR:root:Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
ERROR:root:Error transcribing chunk president_20_kennedy_27_s_20196220_speech_20_on_20_the_20_us_20_space_20_program_2020_c_span_clipaudio_4.wav in President20Kennedy27s20196220Speech20on20the20US20Space20Program2020CSPAN20Classroom.mp4 @ Oct-11-2022_-00
ERROR:root:Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
/content/vid2cleantxt/vid2cleantxt/transcribe.py:303: UserWarning: Error transcribing chunk - see log for details
warnings.warn("Error transcribing chunk - see log for details")
ERROR:root:Error transcribing chunk president_20_kennedy_27_s_20196220_speech_20_on_20_the_20_us_20_space_20_program_2020_c_span_clipaudio_5.wav in President20Kennedy27s20196220Speech20on20the20US20Space20Program2020CSPAN20Classroom.mp4 @ Oct-11-2022_-00
ERROR:root:Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
ERROR:root:Error transcribing chunk president_20_kennedy_27_s_20196220_speech_20_on_20_the_20_us_20_space_20_program_2020_c_span_clipaudio_6.wav in President20Kennedy27s20196220Speech20on20the20US20Space20Program2020CSPAN20Classroom.mp4 @ Oct-11-2022_-00
ERROR:root:Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
ERROR:root:Error transcribing chunk president_20_kennedy_27_s_20196220_speech_20_on_20_the_20_us_20_space_20_program_2020_c_span_clipaudio_7.wav in President20Kennedy27s20196220Speech20on20the20US20Space20Program2020CSPAN20Classroom.mp4 @ Oct-11-2022_-00
ERROR:root:Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
SC_pipeline - transcribed audio: 100%
1/1 [00:00<00:00, 10.98it/s]
/content/vid2cleantxt/v2clntxt_transc_metadata
okay, things work on both sides now that I realized I omitted sending inputs to the GPU (needed input_features = input_features.to(device)
) fixed in 05d3454587b1b2a7da0655e0def94cdd0d7979aa
above.
GPU notebook and tests work: see here CPU notebook and tests work, linked here
give it some tests locally or via CLI to get acquainted and stress test and then I think it's good to merge?
thanks! the code looks fine and the notebooks work. One thing that might be nice would be adding an audio output to the notebooks, we should improve the punctuation.
Ok @JonathanLehner, I made some much-needed changes to reduce verbosity. Give it a look and merge, please. If all the conversations are resolved it should be possible
This PR adds integration for OpenAI's new whisper model, drastically increasing the quality of the output transcribed docs.
vid2cleantxt
will be through the huggingfacetransformers
to avoid adding extra dependencies. See an example model card for current stage implementations and API