readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.49k stars 228 forks source link

Bug in macOS tts wrapper #203

Closed danielbair closed 6 years ago

danielbair commented 6 years ago

I might have found a bug with the macos tts wrapper... When I try: python -m aeneas.tools.synthesize_text -r="tts=macos" list "Hello.|My name is Alex." eng -v test.wav The resulting wav file plays back at double the rate it should. Whereas if I try: say -o temp.wav --data-format LEF32@22050 "Hello. My name is Alex." The resulting file plays back at a normal rate. So then I tried: ffmpeg -v 24 -i temp.wav -ac 1 -ar 16000 -y -map_metadata -1 -flags +bitexact -f wav tmp.wav And that resulting file also plays back at a normal rate. (These were all commands I saw from the aeneas.tools.synthesize_text verbose output.)

Here is the complete verbose output from the original command: [DEBU] 2018-04-09 07:09:26.853716 CLI: Running aeneas 1.7.3 [DEBU] 2018-04-09 07:09:26.853806 CLI: Formal arguments: [u'/usr/local/lib/python2.7/site-packages/aeneas/tools/synthesize_text.py', u'-r=tts=macos', u'list', u'Hello.|My name is Alex.', u'eng', u'-vv', u'test.wav'] [DEBU] 2018-04-09 07:09:26.853837 CLI: Actual arguments: [u'list', u'Hello.|My name is Alex.', u'eng', u'test.wav'] [DEBU] 2018-04-09 07:09:26.853923 CLI: Runtime configuration: 'aba_no_zero_duration=0.001|aba_nonspeech_tolerance=0.080|allow_unlisted_languages=False|c_extensions=True|cdtw=True|cew=True|cew_subprocess_enabled=False|cew_subprocess_path=python|cfw=True|cmfcc=True|downloader_retry_attempts=5|downloader_sleep=1.000|dtw_algorithm=stripe|dtw_margin=60.000|dtw_margin_l1=60.000|dtw_margin_l2=30.000|dtw_margin_l3=10.000|ffmpeg_path=ffmpeg|ffmpeg_sample_rate=16000|ffprobe_path=ffprobe|job_max_tasks=0|mfcc_emphasis_factor=0.97|mfcc_fft_order=512|mfcc_filters=40|mfcc_lower_frequency=133.3333|mfcc_mask_extend_speech_after=0|mfcc_mask_extend_speech_before=0|mfcc_mask_log_energy_threshold=0.699|mfcc_mask_min_nonspeech_length=1|mfcc_mask_nonspeech=False|mfcc_mask_nonspeech_l1=False|mfcc_mask_nonspeech_l2=False|mfcc_mask_nonspeech_l3=False|mfcc_size=13|mfcc_upper_frequency=6855.4976|mfcc_window_length=0.100|mfcc_window_length_l1=0.100|mfcc_window_length_l2=0.050|mfcc_window_length_l3=0.020|mfcc_window_shift=0.040|mfcc_window_shift_l1=0.040|mfcc_window_shift_l2=0.020|mfcc_window_shift_l3=0.005|safety_checks=True|task_max_audio_length=0|task_max_text_length=0|tts=macos|tts_api_retry_attempts=5|tts_api_sleep=1.000|tts_cache=False|tts_l1=espeak|tts_l2=espeak|tts_l3=espeak|vad_extend_speech_after=0.000|vad_extend_speech_before=0.000|vad_log_energy_threshold=0.699|vad_min_nonspeech_length=0.200' [DEBU] 2018-04-09 07:09:26.854580 TextFile: Reading text fragments from list [DEBU] 2018-04-09 07:09:26.854621 TextFile: Parsing fragments from plain text format [DEBU] 2018-04-09 07:09:26.854654 TextFile: Creating TextFragment objects [DEBU] 2018-04-09 07:09:26.854706 TextFile: Created TextFilter object [DEBU] 2018-04-09 07:09:26.854734 TextFilter: Applying regex: '[u'Hello.']' => '[u'Hello.']' [DEBU] 2018-04-09 07:09:26.854863 TextFilter: Applying regex: '[u'My name is Alex.']' => '[u'My name is Alex.']' [DEBU] 2018-04-09 07:09:26.854953 TextFile: Setting language: 'eng' [INFO] 2018-04-09 07:09:26.854982 CLI: Read input text with 2 fragments [INFO] Read input text with 2 fragments [INFO] 2018-04-09 07:09:26.855204 CLI: Synthesizing 2 fragments [INFO] Synthesizing 2 fragments [DEBU] 2018-04-09 07:09:26.855234 Synthesizer: Selecting TTS engine... [DEBU] 2018-04-09 07:09:26.855257 Synthesizer: TTS engine: macOS [DEBU] 2018-04-09 07:09:26.855279 MacOSTTSWrapper: No tts_path specified in rconf, setting default TTS path [DEBU] 2018-04-09 07:09:26.855313 MacOSTTSWrapper: TTS path is None [DEBU] 2018-04-09 07:09:26.855334 MacOSTTSWrapper: TTS cache? False [DEBU] 2018-04-09 07:09:26.855349 MacOSTTSWrapper: Has Python call? False [DEBU] 2018-04-09 07:09:26.855364 MacOSTTSWrapper: Has C extension call? False [DEBU] 2018-04-09 07:09:26.855378 MacOSTTSWrapper: Has subprocess call? True [DEBU] 2018-04-09 07:09:26.855394 MacOSTTSWrapper: Subprocess arguments: [u'say', u'-v', 'VOICE_CODE_STRING', u'-o', 'WAVE_PATH', 'TEXT_STDIN', u'--data-format', u'LEF32@22050'] [DEBU] 2018-04-09 07:09:26.855557 Synthesizer: Selecting TTS engine... done [DEBU] 2018-04-09 07:09:26.855902 Synthesizer: Synthesizing text... [DEBU] 2018-04-09 07:09:26.856134 MacOSTTSWrapper: Calling TTS engine via C extension or subprocess [DEBU] 2018-04-09 07:09:26.856188 MacOSTTSWrapper: C extension '' not recognized [DEBU] 2018-04-09 07:09:26.856214 MacOSTTSWrapper: Running the pure Python code [DEBU] 2018-04-09 07:09:26.856230 MacOSTTSWrapper: Synthesizing multiple via subprocess... [DEBU] 2018-04-09 07:09:26.856248 MacOSTTSWrapper: Calling TTS engine using multiple generic function... [DEBU] 2018-04-09 07:09:26.856262 MacOSTTSWrapper: Determining codec and sample rate... [DEBU] 2018-04-09 07:09:26.856276 MacOSTTSWrapper: Reading codec and sample rate from OUTPUT_AUDIO_FORMAT [DEBU] 2018-04-09 07:09:26.856289 MacOSTTSWrapper: Determining codec and sample rate... done [DEBU] 2018-04-09 07:09:26.856302 MacOSTTSWrapper: codec: pcm_s16le [DEBU] 2018-04-09 07:09:26.856316 MacOSTTSWrapper: sample rate: 22050 [DEBU] 2018-04-09 07:09:26.856379 MacOSTTSWrapper: Examining fragment 0 (no cache)... [DEBU] 2018-04-09 07:09:26.856412 MacOSTTSWrapper: Language to voice code: 'eng' => 'Alex' [DEBU] 2018-04-09 07:09:26.856428 MacOSTTSWrapper: Calling helper function [DEBU] 2018-04-09 07:09:26.856445 MacOSTTSWrapper: Synthesizer helper called with output_file_path=None => creating temporary output file [DEBU] 2018-04-09 07:09:26.857507 MacOSTTSWrapper: Temporary output file path is '/tmp/tmpiRbetw.wav' [DEBU] 2018-04-09 07:09:26.857544 MacOSTTSWrapper: TTS engine reads text from stdin [DEBU] 2018-04-09 07:09:26.857564 MacOSTTSWrapper: Creating arguments list... [DEBU] 2018-04-09 07:09:26.857596 MacOSTTSWrapper: Creating arguments list... done [DEBU] 2018-04-09 07:09:26.857613 MacOSTTSWrapper: Calling TTS engine... [DEBU] 2018-04-09 07:09:26.857628 MacOSTTSWrapper: Calling with arguments '[u'say', u'-v', 'Alex', u'-o', u'/tmp/tmpiRbetw.wav', u'--data-format', u'LEF32@22050']' [DEBU] 2018-04-09 07:09:26.857651 MacOSTTSWrapper: Calling with text 'Hello.' [DEBU] 2018-04-09 07:09:26.862810 MacOSTTSWrapper: Passing text via stdin... [DEBU] 2018-04-09 07:09:26.917161 MacOSTTSWrapper: Passing text via stdin... done [DEBU] 2018-04-09 07:09:26.917244 MacOSTTSWrapper: TTS engine wrote audio data to file [DEBU] 2018-04-09 07:09:26.917278 MacOSTTSWrapper: Calling TTS ... done [DEBU] 2018-04-09 07:09:26.917421 MacOSTTSWrapper: Reading audio data... [DEBU] 2018-04-09 07:09:26.917515 AudioFile: Loading audio data... [DEBU] 2018-04-09 07:09:26.917635 AudioFile: self.file_format is None or not good => converting self.file_path [DEBU] 2018-04-09 07:09:26.917982 AudioFile: Temporary PCM16 mono WAVE file: '/tmp/tmpyqcjZz.wav' [DEBU] 2018-04-09 07:09:26.918024 AudioFile: Converting audio file to mono... [DEBU] 2018-04-09 07:09:26.918306 FFMPEGWrapper: Calling with arguments '['ffmpeg', '-i', u'/tmp/tmpiRbetw.wav', '-ac', '1', '-ar', '16000', '-y', '-map_metadata', '-1', '-flags', '+bitexact', '-f', 'wav', u'/tmp/tmpyqcjZz.wav']' [DEBU] 2018-04-09 07:09:26.973647 FFMPEGWrapper: Call completed [DEBU] 2018-04-09 07:09:26.973879 FFMPEGWrapper: Returning output file path '/tmp/tmpyqcjZz.wav' [DEBU] 2018-04-09 07:09:26.973960 AudioFile: Converting audio file to mono... done [DEBU] 2018-04-09 07:09:26.975108 AudioFile: Deleted temporary audio file: '/tmp/tmpyqcjZz.wav' [DEBU] 2018-04-09 07:09:26.975250 AudioFile: Sample length: 0.605 [DEBU] 2018-04-09 07:09:26.975316 AudioFile: Sample rate: 16000 [DEBU] 2018-04-09 07:09:26.975339 AudioFile: Audio format: pcm16 [DEBU] 2018-04-09 07:09:26.975362 AudioFile: Audio channels: 1 [DEBU] 2018-04-09 07:09:26.975381 AudioFile: Loading audio data... done [DEBU] 2018-04-09 07:09:26.975411 MacOSTTSWrapper: Duration of '/tmp/tmpiRbetw.wav': 0.605000 [DEBU] 2018-04-09 07:09:26.975438 MacOSTTSWrapper: Reading audio data... done [DEBU] 2018-04-09 07:09:26.975470 MacOSTTSWrapper: Removing temporary output file path '/tmp/tmpiRbetw.wav' [DEBU] 2018-04-09 07:09:26.975886 MacOSTTSWrapper: Examining fragment 0 (no cache)... done [DEBU] 2018-04-09 07:09:26.975964 MacOSTTSWrapper: Fragment 0 starts at: 0.000 [DEBU] 2018-04-09 07:09:26.976026 MacOSTTSWrapper: Fragment 0 duration: 0.605 [DEBU] 2018-04-09 07:09:26.976103 AudioFile: Adding samples... [DEBU] 2018-04-09 07:09:26.976125 AudioFile: Not initialized [DEBU] 2018-04-09 07:09:26.976236 AudioFile: Current sample capacity is (samples): 19360 [DEBU] 2018-04-09 07:09:26.976404 AudioFile: Adding samples... done [DEBU] 2018-04-09 07:09:26.976443 MacOSTTSWrapper: Examining fragment 1 (no cache)... [DEBU] 2018-04-09 07:09:26.976489 MacOSTTSWrapper: Language to voice code: 'eng' => 'Alex' [DEBU] 2018-04-09 07:09:26.976515 MacOSTTSWrapper: Calling helper function [DEBU] 2018-04-09 07:09:26.976542 MacOSTTSWrapper: Synthesizer helper called with output_file_path=None => creating temporary output file [DEBU] 2018-04-09 07:09:26.976867 MacOSTTSWrapper: Temporary output file path is '/tmp/tmpcoSgyn.wav' [DEBU] 2018-04-09 07:09:26.976920 MacOSTTSWrapper: TTS engine reads text from stdin [DEBU] 2018-04-09 07:09:26.976947 MacOSTTSWrapper: Creating arguments list... [DEBU] 2018-04-09 07:09:26.976974 MacOSTTSWrapper: Creating arguments list... done [DEBU] 2018-04-09 07:09:26.976998 MacOSTTSWrapper: Calling TTS engine... [DEBU] 2018-04-09 07:09:26.977013 MacOSTTSWrapper: Calling with arguments '[u'say', u'-v', 'Alex', u'-o', u'/tmp/tmpcoSgyn.wav', u'--data-format', u'LEF32@22050']' [DEBU] 2018-04-09 07:09:26.977045 MacOSTTSWrapper: Calling with text 'My name is Alex.' [DEBU] 2018-04-09 07:09:26.981328 MacOSTTSWrapper: Passing text via stdin... [DEBU] 2018-04-09 07:09:27.032148 MacOSTTSWrapper: Passing text via stdin... done [DEBU] 2018-04-09 07:09:27.032222 MacOSTTSWrapper: TTS engine wrote audio data to file [DEBU] 2018-04-09 07:09:27.032249 MacOSTTSWrapper: Calling TTS ... done [DEBU] 2018-04-09 07:09:27.032384 MacOSTTSWrapper: Reading audio data... [DEBU] 2018-04-09 07:09:27.032454 AudioFile: Loading audio data... [DEBU] 2018-04-09 07:09:27.032565 AudioFile: self.file_format is None or not good => converting self.file_path [DEBU] 2018-04-09 07:09:27.032883 AudioFile: Temporary PCM16 mono WAVE file: '/tmp/tmpXnvICn.wav' [DEBU] 2018-04-09 07:09:27.032923 AudioFile: Converting audio file to mono... [DEBU] 2018-04-09 07:09:27.033182 FFMPEGWrapper: Calling with arguments '['ffmpeg', '-i', u'/tmp/tmpcoSgyn.wav', '-ac', '1', '-ar', '16000', '-y', '-map_metadata', '-1', '-flags', '+bitexact', '-f', 'wav', u'/tmp/tmpXnvICn.wav']' [DEBU] 2018-04-09 07:09:27.065137 FFMPEGWrapper: Call completed [DEBU] 2018-04-09 07:09:27.065461 FFMPEGWrapper: Returning output file path '/tmp/tmpXnvICn.wav' [DEBU] 2018-04-09 07:09:27.065535 AudioFile: Converting audio file to mono... done [DEBU] 2018-04-09 07:09:27.066410 AudioFile: Deleted temporary audio file: '/tmp/tmpXnvICn.wav' [DEBU] 2018-04-09 07:09:27.066500 AudioFile: Sample length: 1.339 [DEBU] 2018-04-09 07:09:27.066557 AudioFile: Sample rate: 16000 [DEBU] 2018-04-09 07:09:27.066583 AudioFile: Audio format: pcm16 [DEBU] 2018-04-09 07:09:27.066603 AudioFile: Audio channels: 1 [DEBU] 2018-04-09 07:09:27.066620 AudioFile: Loading audio data... done [DEBU] 2018-04-09 07:09:27.066648 MacOSTTSWrapper: Duration of '/tmp/tmpcoSgyn.wav': 1.339500 [DEBU] 2018-04-09 07:09:27.066674 MacOSTTSWrapper: Reading audio data... done [DEBU] 2018-04-09 07:09:27.066703 MacOSTTSWrapper: Removing temporary output file path '/tmp/tmpcoSgyn.wav' [DEBU] 2018-04-09 07:09:27.066973 MacOSTTSWrapper: Examining fragment 1 (no cache)... done [DEBU] 2018-04-09 07:09:27.067041 MacOSTTSWrapper: Fragment 1 starts at: 0.605 [DEBU] 2018-04-09 07:09:27.067087 MacOSTTSWrapper: Fragment 1 duration: 1.339 [DEBU] 2018-04-09 07:09:27.067155 AudioFile: Adding samples... [DEBU] 2018-04-09 07:09:27.067176 AudioFile: Previous sample length was (samples): 9680 [DEBU] 2018-04-09 07:09:27.067197 AudioFile: Previous sample capacity was (samples): 19360 [DEBU] 2018-04-09 07:09:27.067954 AudioFile: Current sample capacity is (samples): 62224 [DEBU] 2018-04-09 07:09:27.068077 AudioFile: Adding samples... done [DEBU] 2018-04-09 07:09:27.068111 MacOSTTSWrapper: Minimizing memory... [DEBU] 2018-04-09 07:09:27.068128 AudioFile: Initialized, minimizing memory... [DEBU] 2018-04-09 07:09:27.068143 AudioFile: Previous sample length was (samples): 31112 [DEBU] 2018-04-09 07:09:27.068165 AudioFile: Previous sample capacity was (samples): 62224 [DEBU] 2018-04-09 07:09:27.068571 AudioFile: Current sample capacity is (samples): 31112 [DEBU] 2018-04-09 07:09:27.068595 AudioFile: Initialized, minimizing memory... done [DEBU] 2018-04-09 07:09:27.068610 MacOSTTSWrapper: Minimizing memory... done [DEBU] 2018-04-09 07:09:27.068624 MacOSTTSWrapper: Writing audio file 'test.wav' [DEBU] 2018-04-09 07:09:27.068644 AudioFile: Writing audio file 'test.wav'... [DEBU] 2018-04-09 07:09:27.069223 AudioFile: Writing audio file 'test.wav'... done [DEBU] 2018-04-09 07:09:27.069272 MacOSTTSWrapper: Returning 2 time anchors [DEBU] 2018-04-09 07:09:27.069295 MacOSTTSWrapper: Current time 1.944 [DEBU] 2018-04-09 07:09:27.069332 MacOSTTSWrapper: Synthesized 22 characters [DEBU] 2018-04-09 07:09:27.069348 MacOSTTSWrapper: Calling TTS engine using multiple generic function... done [DEBU] 2018-04-09 07:09:27.069370 MacOSTTSWrapper: Synthesizing multiple via subprocess... done [DEBU] 2018-04-09 07:09:27.069399 Synthesizer: Synthesizing text... done [SUCC] 2018-04-09 07:09:27.069456 CLI: Created file 'test.wav' [INFO] Created file 'test.wav' [DEBU] 2018-04-09 07:09:27.069550 CLI: Execution completed with code 0

readbeyond commented 6 years ago

Solved by #204, closing.