saurabhshri / CCAligner

🔮 Word by word audio subtitle synchronisation tool and API. Developed under GSoC 2017 with CCExtractor.
165 stars 34 forks source link

Can't create vocabulary #53

Closed ghost closed 6 years ago

ghost commented 6 years ago

I had already tried a lot of time to use parameters and some tested files. However, It showed me some error.

Here is the error showed on the screen: grammar_tools.cpp (403) : generate | Something went wrong while creating vocabulary!

PS: test.wav is just a white noise and test.srt just contains some tested subtitles. test.zip

ghost commented 6 years ago

The parameters are "--wav test.wav --srt test.srt".

harrynull commented 6 years ago

Could you give us the output before the error? Also, please check if you have all dependencies installed.

ghost commented 6 years ago

/usr/bin/valgrind --tool=memcheck --xml=yes --xml-file=/tmp/valgrind --gen-suppressions=all --leak-check=full --leak-resolution=med --track-origins=yes /home/retrospect/Documents/code-in/CCAligner/src/cmake-build-debug/ccaligner -wav test.wav -srt test.srt


/ / | / \ | (_) | | | | / \ | | |/ ` | ' \ / \ '| | || | / | | | (_| | | | | / |
_____// __|_|\, || ||_
|_|
|___/

CCAligner 0.03 Alpha [Shubham] Word by Word Audio-Subtitle Synchronization Saurabh Shrivastava | saurabh.shrivastava54@gmail.com https://github.com/saurabhshri/CCAligner

[12-14 10:53:14][Debug] Initialising Aligner using PocketSphinx [12-14 10:53:14][Debug] Audio Filename: test.wav Subtitle filename: test.srt [12-14 10:53:14][Info] Reading and decoding audio samples... [12-14 10:53:14][Debug] Begin reading WAV file [12-14 10:53:14][Debug] Opening mode chosen: readFile, proceeding [12-14 10:53:14][Debug] Trying to read from file : test.wav [12-14 10:53:14][Debug] Reading file data [12-14 10:53:15][Debug] File data read and stored in buffer [12-14 10:53:15][Debug] Processing data and extracting samples [12-14 10:53:15][Debug] Checking chunkID, should be RIFF [12-14 10:53:15][Debug] Wave File chunkID verification successful [12-14 10:53:15][Debug] Begin decoding wave file [12-14 10:53:15][Debug] File format is identified as WAV [12-14 10:53:15][Debug] Finding FMT and DATA subchunks [12-14 10:53:15][Debug] FMT index : 12 , DATA index : 88 [12-14 10:53:15][Debug] PCM : True [12-14 10:53:15][Debug] MONO : True [12-14 10:53:15][Debug] Sample Rate 16KHz : True [12-14 10:53:15][Debug] BitRate 16 bits/sec : True [12-14 10:53:15][Debug] Number of samples : 64000 [12-14 10:53:15][Debug] Reading samples [12-14 10:53:15][Debug] Successfully decoded [12-14 10:53:15][Debug] File decoded successfully [12-14 10:53:15][Debug] Generating Grammar based on subtitles, Grammar Name: 6 [12-14 10:53:16][Info] Generating language model and grammar files... [12-14 10:53:16][Info] Note: You have chosen to generate a dictionary. Based on your TensorFlow configuration, [12-14 10:53:16][Info] this may take some time, please be patient. For alternatives, see docs. [12-14 10:53:16][Debug] Creating temporary directories at tempFiles/ [12-14 10:53:16][Debug] Directories created successfully! [12-14 10:53:16][Info] Creating Corpus : tempFiles/corpus/corpus.txt [12-14 10:53:16][Info] Creating Phonetic Corpus : tempFiles/corpus/phoneticCorpus.txt [12-14 10:53:20][Debug] Creating vocabulary... [12-14 10:53:20][Debug] Vocabulary created! [12-14 10:53:20][Info] Creating the Dictionary, this might take a little time depending on your TensorFlow configuration : tempFiles/dict/complete.dict Traceback (most recent call last): File "/usr/local/bin/g2p-seq2seq", line 11, in load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')() File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 82, in main File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 72, in load_decode_model RuntimeError: Model not found in g2p-seq2seq-cmudict/ [12-14 10:53:21][Fatal] /home/retrospect/Documents/code-in/CCAligner/src/lib_ccaligner/grammar_tools.cpp (191) : GenerateDict | Something went wrong while creating dictionary! terminate called after throwing an instance of 'UnknownError' what(): [12-14 10:53:21][Fatal] /home/retrospect/Documents/code-in/CCAligner/src/lib_ccaligner/grammar_tools.cpp (191) : GenerateDict | Something went wrong while creating dictionary!

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

ghost commented 6 years ago

I have already installed all the dependencies, but there came another error.

harrynull commented 6 years ago

RuntimeError: Model not found in g2p-seq2seq-cmudict/

It says model not found. Please check if you have done the following procedure:

Make sure the model folder and g2p-seq2seq-cmudict are in the directory where you are compiling CCAligner.

The model folder and g2p-seq2seq-cmudict are in install/ and you need to copy them manually to your program's working directory. You also need quick_lm.pl to be available.

ghost commented 6 years ago

Thank you! It works.