huggingface / dataspeech

MIT License
313 stars 48 forks source link

UnboundLocalError: cannot access local variable 't' where it is not associated with a value """ #17

Closed anioji closed 6 months ago

anioji commented 6 months ago

What i do

Hello. I tried to annotate my own dataset. And I got an error that I don't understand. I'm a newbie. He is generally unable to understand what happened and why it happened.

I am attaching all the materials that I have

I have CSV-Scheme

audio text speeker_id
./audio/audio_427.wav Текст на кириллице 1111

I upload CSV and cast csv as written in the documentation. Uploading to HgFace. I start dataspeech with arguments. He loaded it, he started doing something, and then that was it.

What i group dataset

python group_dataset.py from_audio to_csv

Out. It save datasets.csv:

./audio/audio_427.wav, а затем базальта!. ,1111
./audio/audio_231.wav, razus!. ,1111

Cast and upload dataset to HG

python group_dataset.py from_csv cast_audio push_to_hub
# In short it does this >

df = Dataset.from_csv("./datasets.csv")
df = df.cast_column("audio", Audio(32000))
df.push_to_hub(repo_id="", token="")

Start dataspeach

python main.py "Anioji/testra" \
--configuration "default" \
--output_dir /root/dataspeech/tmp_stone_base/ \
--text_column_name "text_original" \
--audio_column_name "audio" \
--cpu_num_workers 4 \
--num_workers_per_gpu 4 \
--rename_column \

Tracelog

/root/dataspeech/venv/lib/python3.11/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
WARNING - torchvision is not available - cannot save figures
Compute speaking rate
Compute snr and reverb
Map (num_proc=4):   0%|                                                  | 0/534 [00:00<?, ? examples/s]/root/dataspeech/venv/lib/python3.11/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
/root/dataspeech/venv/lib/python3.11/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
WARNING - torchvision is not available - cannot save figures
WARNING - torchvision is not available - cannot save figures
INFO - Lightning automatically upgraded your loaded checkpoint from v1.6.5 to v2.2.2. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../.cache/huggingface/hub/models--ylacombe--brouhaha-best/snapshots/99bf97b13fd4dda2434a6f7c50855933076f2937/best.ckpt`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.12.1+cu102, yours is 2.2.2+cu121. Bad things might happen unless you revert torch to 1.x.
Using default parameters optimized on Brouhaha
Map (num_proc=4):   3%|█▏                                       | 16/534 [00:08<04:39,  1.85 examples/s]Using default parameters optimized on Brouhaha
Map (num_proc=4):   6%|██▍                                      | 32/534 [00:09<02:00,  4.16 examples/s]Using default parameters optimized on Brouhaha
Map (num_proc=4):   9%|███▋                                     | 48/534 [00:09<01:10,  6.91 examples/s]Using default parameters optimized on Brouhaha
Map (num_proc=4):  12%|████▉                                    | 64/534 [00:10<00:46, 10.02 examples/s]Using default parameters optimized on Brouhaha
Map (num_proc=4):  15%|██████▏                                  | 80/534 [00:10<00:35, 12.97 examples/s]Using default parameters optimized on Brouhaha
Map (num_proc=4):  18%|███████▎                                 | 96/534 [00:11<00:28, 15.57 examples/s]Using default parameters optimized on Brouhaha
Map (num_proc=4):  18%|███████▎                                 | 96/534 [00:12<00:57,  7.58 examples/s]
multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/root/dataspeech/venv/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/root/dataspeech/venv/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 675, in _write_generator_to_queue
    for i, result in enumerate(func(**kwargs)):
  File "/root/dataspeech/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3547, in _map_single
    batch = apply_function_on_filtered_inputs(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/dataspeech/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3416, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/dataspeech/dataspeech/gpu_enrichments/snr_and_reverb.py", line 32, in snr_apply
    res = pipeline({"sample_rate": sample["sampling_rate"],
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/dataspeech/venv/lib/python3.11/site-packages/pyannote/audio/core/pipeline.py", line 325, in __call__
    return self.apply(file, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/dataspeech/venv/lib/python3.11/site-packages/brouhaha/pipeline.py", line 146, in apply
    speech: Annotation = self._binarize(speech_seg)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/dataspeech/venv/lib/python3.11/site-packages/pyannote/audio/utils/signal.py", line 303, in __call__
    region = Segment(start - self.pad_onset, t + self.pad_offset)
                                             ^
UnboundLocalError: cannot access local variable 't' where it is not associated with a value
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/dataspeech/main.py", line 49, in <module>
    snr_dataset = dataset.map(
                  ^^^^^^^^^^^^
  File "/root/dataspeech/venv/lib/python3.11/site-packages/datasets/dataset_dict.py", line 869, in map
    {
  File "/root/dataspeech/venv/lib/python3.11/site-packages/datasets/dataset_dict.py", line 870, in <dictcomp>
    k: dataset.map(
       ^^^^^^^^^^^^
  File "/root/dataspeech/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/dataspeech/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/dataspeech/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3248, in map
    for rank, done, content in iflatmap_unordered(
  File "/root/dataspeech/venv/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 715, in iflatmap_unordered
    [async_result.get(timeout=0.05) for async_result in async_results]
  File "/root/dataspeech/venv/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 715, in <listcomp>
    [async_result.get(timeout=0.05) for async_result in async_results]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/dataspeech/venv/lib/python3.11/site-packages/multiprocess/pool.py", line 774, in get
    raise self._value
UnboundLocalError: cannot access local variable 't' where it is not associated with a value
anioji commented 6 months ago

The solution was to do TMPDIR="/some/path" before pip install -r requirements.txt

TMPDIR="/root/tmp" pip install -r requirements.txt

But this did not solve the problems associated with the fact that I have major differences in the versions of pytorch in this project and on my server.

Nobody fixed the versions. That's why he cursed at Torchvision, and now he curses at pyannote and lighning_pytorch

INFO - Lightning automatically upgraded your loaded checkpoint from v1.6.5 to v2.2.4. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../.cache/huggingface/hub/models--ylacombe--brouhaha-best/snapshots/99bf97b13fd4dda2434a6f7c50855933076f2937/best.ckpt`
INFO - Lightning automatically upgraded your loaded checkpoint from v1.6.5 to v2.2.4. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../.cache/huggingface/hub/models--ylacombe--brouhaha-best/snapshots/99bf97b13fd4dda2434a6f7c50855933076f2937/best.ckpt`
INFO - Lightning automatically upgraded your loaded checkpoint from v1.6.5 to v2.2.4. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../.cache/huggingface/hub/models--ylacombe--brouhaha-best/snapshots/99bf97b13fd4dda2434a6f7c50855933076f2937/best.ckpt`
Model was trained with pyannote.audio 0.0.1, yours is 3.2.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.12.1+cu102, yours is 2.3.0+cu121. Bad things might happen unless you revert torch to 1.x.
Model was trained with pyannote.audio 0.0.1, yours is 3.2.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.12.1+cu102, yours is 2.3.0+cu121. Bad things might happen unless you revert torch to 1.x.
INFO - Lightning automatically upgraded your loaded checkpoint from v1.6.5 to v2.2.4. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../.cache/huggingface/hub/models--ylacombe--brouhaha-best/snapshots/99bf97b13fd4dda2434a6f7c50855933076f2937/best.ckpt`
Model was trained with pyannote.audio 0.0.1, yours is 3.2.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.12.1+cu102, yours is 2.3.0+cu121. Bad things might happen unless you revert torch to 1.x.
Model was trained with pyannote.audio 0.0.1, yours is 3.2.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.12.1+cu102, yours is 2.3.0+cu121. Bad things might happen unless you revert torch to 1.x.
anioji commented 6 months ago

@ylacombe can you get current versions of packets for dataspeech?