xinjli / allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
GNU General Public License v3.0
532 stars 85 forks source link

assert wave_path.exists() #27

Closed jesserme closed 3 years ago

jesserme commented 3 years ago

Hi, so I'm running into what should be a simple problem, but I simply can't figure out what I'm doing wrong.

I run the following command python -m allosaurus.bin.prep_feat --path='C:\Users\maria\Allo\train'

I wanted to test with just a few samples to make sure I have everything working before using the complete dataset, but I'm stuck on this stage.

The wave txt file for the train directory contains

utt_1 C:\Users\maria\Allo\koyi\crdo-KKT_CONVERSATION_CONVERSATIONs5.wav utt_2 C:\Users\maria\Allo\koyi\crdo-KKT_CONVERSATION_CONVERSATIONs6.wav utt_3 C:\Users\maria\Allo\koyi\crdo-KKT_CONVERSATION_CONVERSATIONs7.wav utt_4 C:\Users\maria\Allo\koyi\crdo-KKT_CONVERSATION_CONVERSATIONs8.wav utt_5 C:\Users\maria\Allo\koyi\crdo-KKT_CONVERSATION_CONVERSATIONs9.wav utt_6 C:\Users\maria\Allo\koyi\crdo-KKT_CONVERSATION_CONVERSATIONs10.wav utt_7 C:\Users\maria\Allo\koyi\crdo-KKT_CONVERSATION_CONVERSATIONs12.wav utt_8 C:\Users\maria\Allo\koyi\crdo-KKT_CONVERSATION_CONVERSATIONs14.wav utt_9 C:\Users\maria\Allo\koyi\crdo-KKT_CONVERSATION_CONVERSATIONs15.wav utt_10 C:\Users\maria\Allo\koyi\crdo-KKT_CONVERSATION_CONVERSATIONs16.wav

but I keep getting the error

Traceback (most recent call last): File "C:\Users\maria\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\maria\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\maria\AppData\Local\Programs\Python\Python39\lib\site-packages\allosaurus\bin\prep_feat.py", line 57, in assert wave_path.exists(), "the path directory should contain a wave file, please check README.md for details" AssertionError: the path directory should contain a wave file, please check README.md for details

Is my wave file just formatted incorrectly and that's why I keep getting an error about no wav files existing? Is my command line argument the reason? Thank you for any help.

xinjli commented 3 years ago

Hi,

I guess this is because your Windows path format is wrong.

Try --path='C:\\Users\\maria\\Allo\\train' or --path='C:/Users/maria/Allo/train' instead and then fix paths to the same format inside your wave txt

jesserme commented 3 years ago

Okay so I tried the second one and still get

C:\Users\maria>python -m allosaurus.bin.prep_feat --path='C:/Users/maria/Allo/train' Traceback (most recent call last): File "C:\Users\maria\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\maria\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\maria\AppData\Local\Programs\Python\Python39\lib\site-packages\allosaurus\bin\prep_feat.py", line 57, in assert wave_path.exists(), "the path directory should contain a wave file, please check README.md for details" AssertionError: the path directory should contain a wave file, please check README.md for details

I changed my wave txt file to look the same utt_1 'C:/Users/maria/Allo/koyi/crdo-KKT_CONVERSATION_CONVERSATIONs5.wav' utt_2 'C:/Users/maria/Allo/koyi/crdo-KKT_CONVERSATION_CONVERSATIONs6.wav' utt_3 'C:/Users/maria/Allo/koyi/crdo-KKT_CONVERSATION_CONVERSATIONs7.wav' utt_4 'C:/Users/maria/Allo/koyi/crdo-KKT_CONVERSATION_CONVERSATIONs8.wav' utt_5 'C:/Users/maria/Allo/koyi/crdo-KKT_CONVERSATION_CONVERSATIONs9.wav' utt_6 'C:/Users/maria/Allo/koyi/crdo-KKT_CONVERSATION_CONVERSATIONs10.wav' utt_7 'C:/Users/maria/Allo/koyi/crdo-KKT_CONVERSATION_CONVERSATIONs12.wav' utt_8 'C:/Users/maria/Allo/koyi/crdo-KKT_CONVERSATION_CONVERSATIONs14.wav' utt_9 'C:/Users/maria/Allo/koyi/crdo-KKT_CONVERSATION_CONVERSATIONs15.wav' utt_10 'C:/Users/maria/Allo/koyi/crdo-KKT_CONVERSATION_CONVERSATIONs16.wav'

I was going through the readme and ran the sample wav successfully so I know some things work, but I guess I'm just confused why it doesn't seem to want to find my own data that I want to test it on.

jesserme commented 3 years ago

I'm pretty confused and tried running it on a Linux VM to see if using Windows was the problem. I did a test with one of my wav files

python -m allosaurus.run -i Allo/koyi/crdo-KKT_CONVERSATION_CONVERSATIONs5.wav

And it produced the transcription output. I'm guessing maybe the problem is with how I'm formatting my wave txt file or something because it seems like Allosaurus is able to process/find my files otherwise. I'm just not sure what I'm doing wrong.

xinjli commented 3 years ago

hmm, another possibility is that you have an extension on your wave file. It should be "wave" without any extension like wave.txt.

jesserme commented 3 years ago

Thank you, that was it. I wouldn't have guessed that the .txt was causing issues this whole time.