huggingface / dataspeech

MIT License
310 stars 47 forks source link

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (472, 472) at dimension 1 of input [1, 400] #25

Closed anioji closed 6 months ago

anioji commented 6 months ago

The decision is becoming stranger with every MONTH, and at the same time not at all obvious and not even understandable to me personally.

Now, I have reached the annotation of the data set and the test data set for 10 audio recordings has passed all the stages, even on the CPU. But now the annotation does not want to pass even part of what has passed. Despite the fact that the value at which the fall occurs, changes. From 74 audio recordings to 168 and now he is unwilling to go through 4.

If this is a question for the data set. Then I wanted to know the criteria for it.

Text was extracted using Whisper The cast took place using huggingface/datasets

(venv) lab@lab-ub ~/dataspeech (main)> cat start.sh
python main.py "Anioji/testra" \
  --configuration "default" \
  --text_column_name "text_original" \
  --audio_column_name "audio" \
  --cpu_num_workers 8 \
  --rename_column \
  --num_workers_per_gpu_for_pitch 2 \
  --num_workers_per_gpu_for_snr 2 \
  --repo_id "Anioji/test-rb"

(venv) lab@lab-ub ~/dataspeech (main)> bash start.sh
Compute pitch
Map (num_proc=2):   0%|                                                  | 0/528 [00:00<?, ? examples/s][W NNPACK.cpp:61] Could not initialize NNPACK! Reason: Unsupported hardware.
Map (num_proc=2):   1%|▎                                         | 4/528 [00:08<19:12,  2.20s/ examples]
multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 678, in _write_generator_to_queue
    for i, result in enumerate(func(**kwargs)):
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3547, in _map_single
    batch = apply_function_on_filtered_inputs(
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3416, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/home/lab/dataspeech/dataspeech/gpu_enrichments/pitch.py", line 29, in pitch_apply
    pitch, periodicity = penn.from_audio(
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/penn/core.py", line 57, in from_audio
    for frames in preprocess(
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/penn/core.py", line 510, in preprocess
    audio = torch.nn.functional.pad(
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 4522, in pad
    return torch._C._nn.pad(input, pad, mode, value)
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (472, 472) at dimension 1 of input [1, 400]
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/lab/dataspeech/main.py", line 41, in <module>
    pitch_dataset = dataset.map(
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/datasets/dataset_dict.py", line 869, in map
    {
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/datasets/dataset_dict.py", line 870, in <dictcomp>
    k: dataset.map(
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3248, in map
    for rank, done, content in iflatmap_unordered(
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 718, in iflatmap_unordered
    [async_result.get(timeout=0.05) for async_result in async_results]
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 718, in <listcomp>
    [async_result.get(timeout=0.05) for async_result in async_results]
  File "/home/lab/dataspeech/venv/lib/python3.10/site-packages/multiprocess/pool.py", line 774, in get
    raise self._value
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (472, 472) at dimension 1 of input [1, 400]

(venv) lab@lab-ub ~/dataspeech (main) [0|1]> fastfetch
                            ....               ┌─────────── Hardware Information ───────────┐
              .',:clooo:  .:looooo:.                KVM/QEMU Standard PC (i440FX + PIIX, 1996) (pc-i4)
           .;looooooooc  .oooooooooo'               AMD Ryzen 7 3700X (8) @ 3.59 GHz
        .;looooool:,''.  :ooooooooooc               NVIDIA GeForce GTX 1070 [Discrete]
       ;looool;.         'oooooooooo,            󰑭  534.50 MiB / 7.75 GiB (7%)
      ;clool'             .cooooooc.  ,,           41.03 GiB / 117.56 GiB (35%) - ext4
         ...                ......  .:oo,      ├─────────── Software Information ───────────┤
  .;clol:,.                        .loooo'         Ubuntu jammy 22.04 x86_64
 :ooooooooo,                        'ooool         Linux 5.15.0-107-generic
'ooooooooooo.                        loooo.      󰏖  855 (dpkg)
'ooooooooool                         coooo.    └────────────────────────────────────────────┘
 ,loooooooc.                        .loooo.      ● ● ● ● ● ● ● ● 
   .,;;;'.                          ;ooooc    
       ...                         ,ooool.    
    .cooooc.              ..',,'.  .cooo.    
      ;ooooo:.           ;oooooooc.  :l.    
       .coooooc,..      coooooooooo.    
         .:ooooooolc:. .ooooooooooo'    
           .':loooooo;  ,oooooooooc    
               ..';::c'  .;loooo:'

(venv) lab@lab-ub ~/dataspeech (main)> nvidia-smi
Wed May 29 07:36:24 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1070        Off |   00000000:00:10.0 Off |                  N/A |
|  0%   49C    P0             39W /  170W |       2MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
(venv) lab@lab-ub ~/dataspeech (main)> 
ylacombe commented 6 months ago

Hey @anioji, thanks for opening the issue, Could you send the dataset you're working on ? I've searched for Anioji/testra but it seems to be private.

Also you might want to double-check:

  1. that your audios are 2 to 30 seconds long
  2. that they are mono-channel (I haven't tested the code on stereo, but it might work)

The decision is becoming stranger with every MONTH, and at the same time not at all obvious and not even understandable to me personally.

I don't understand what you mean there!

anioji commented 6 months ago

Hey @anioji, thanks for opening the issue, Could you send the dataset you're working on ? I've searched for Anioji/testra but it seems to be private.

I can provide a reading token if it's useful.

Stereo, maybe. The wav file has two channels and is pseudo-stereo. I can convert everything to mono. But I think that's not the point

But most likely the problem is really in the length of the audio recordings. These are short phrases of 1-10 seconds

The decision is becoming stranger with every MONTH, and at the same time not at all obvious and not even understandable to me personally.

Im stupid, and expressive person. And for a month I have been receiving floating errors that are solved in ways unknown to me.

Sorry and thanks