huggingface / dataspeech

MIT License
313 stars 48 forks source link

Update pitch.py #12

Closed yaoqih closed 7 months ago

yaoqih commented 7 months ago

I have a error when i process my dataset. So, I want to set batched = False in main.py line 46 when dataset.map. But I get another error like this:

multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/huyaoqi/anaconda3/lib/python3.9/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/huyaoqi/anaconda3/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 675, in _write_generator_to_queue
    for i, result in enumerate(func(**kwargs)):
  File "/home/huyaoqi/anaconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3517, in _map_single
    example = apply_function_on_filtered_inputs(example, i, offset=offset)
  File "/home/huyaoqi/anaconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3416, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/root/dubbing/dataspeech/dataspeech/gpu_enrichments/pitch.py", line 49, in pitch_apply
    torch.tensor(sample["array"][None, :]).float(),
UnboundLocalError: local variable 'sample' referenced before assignment
"""

so, I guess sample was forgotten to be defined in batched=False. I add it and it works.

ylacombe commented 7 months ago

Thanks for the patch !