readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.49k stars 228 forks source link

How to get the highest volume of the word? #184

Closed onsunsl closed 6 years ago

onsunsl commented 7 years ago

@readbeyond Hi Alberto, I wan to get the highest volume and position of the word that alignment at word level. What should i do?

thanks a lot.

-Onunsl

readbeyond commented 7 years ago

Hi,

I am sorry but I do not understand your request at all; please try to formulate it with different words, maybe with an example, thank you.

In particular, what do you mean with "highest volume"? The word which is spoken "louder"? Or do you mean something else?

AP

On 07/26/2017 05:05 AM, onsunsl wrote:

@readbeyond https://github.com/readbeyond Hi Alberto, I wan to get the highest volume and position of the word that alignment at word level. What should i do?

onsunsl commented 7 years ago

Hi Alberto,

I'm sorry, I did not describe it clearly. I want to get the highest amplitude and time point in a single word audio data.

Thanks again

-Onsunsl

onsunsl commented 7 years ago

Another question: In the library tutorial task input parameters are text files, how to change the parameters to the string type? for example:

# create Task object
task = Task()
task.audio_file_path_absolute = u"./audio.wav"
task.text = u"hello world!"

Thanks

readbeyond commented 7 years ago

See the second example ("Create a TextFile programmatically, and assign it to Task") of:

https://www.readbeyond.it/aeneas/docs/libtutorial.html#concepts

Alternatively, you can write your contents to a temporary file (tempfile package in Python), pass its path as in the documentation examples, and then delete the temporary file after the alignment has been computed.

Note that conceptually aeneas expects a list of fragments (strings), not a string --- an input text format ("plain", "subtitles", etc.) implicitly defines a way to parse the contents of a file into a list of fragments. In fact "hello world!" alone is ambiguous: is it a single fragment or a list of two fragments ("hello", "world")? Etc.

HTH,

AP

On 07/27/2017 05:40 AM, onsunsl wrote:

Another question: In the library tutorial task input parameters are text files, how to change the parameters to the string type? for example:

create Task object

task = Task() task.audio_file_path_absolute = u"./audio.wav" task.text = u"hello world!"

Thanks

onsunsl commented 7 years ago

@readbeyond

hi, Also the first question, how to get loudness and time of the single word. thanks again.

-Onunsl

readbeyond commented 7 years ago

(I sent this already, but apparently it did not get through GitHub, apparently.)

First of all, in any case there is nothing like that in aeneas command line tools, you need to use it as a library in your code.

I think you need to define better what "highest amplitude" means for a word.

If you just need the maximum amplitude of the audio signal, then you can simply take the max over the array representing the (mono) signal. No need to use aeneas at all, you can do that with wave and numpy.

The fact is that this way you get the maximum amplitude of a single audio sample --- which is probably not what you want. For example, if your audio file is quite saturated, you will have many audio samples that have value max(2 bytes signed int) = 32767, and those do not necessarily indicate where the "loudest word" is.

You probably want to define something like:

(1) amplitude(word) = average(A_1, A_2, ... A_N)

where A_1, A_2, ..., A_N are the amplitudes of the audio samples of word, i.e. between onset(word) and offset(word), or --- probably even better:

(2) amplitude(word) = count(A_i > 3 * AVG, for i=1..N) / N

where AVG = average(absolute(A_i)) with the average is taken over the all samples in the audio file.

This second metric (2) basically measures what % of audio samples in a word are above the "average (absolute) level" of the audio file. Note you need the absolute value, because an audio file usually has zero or near-zero average, since the signal oscillates roughly symmetrically between positive and negative amplitudes.

Finally, another approach is

(3) amplitude(word) = average(MFCC_0[i]) / average_all_audio(MFCC_0[j]),

where MFCC_0[i] is the first MFCC coefficient of frame i, and you take onset(word) <= i < offset(word), while the second average is done over all the audio signal. Metric (3) say how "energetic" a word is, w.r.t. the average energy of the entire audio.

Clearly in all three cases, you finally take the max over all words.

You can use aeneas to get the onset and offset times for each word as you already know. Then, you can inspect AudioFile.audio_samples (which is a numpy array) for computing (1) and (2), or AudioFileMFCC.all_mfcc (or AudioFileMFCC.middle_mfcc if you iterate over each word, setting head_length and middle_length to the onset and duration of each word).

In general, aeneas is not designed to do signal processing/analysis. Other libraries (e.g. librosa or even directly numpy) might have better tools for that. You probably still have to decide the onset/offset of each word, someway, though.

HTH,

Alberto Pettarin

On 07/27/2017 04:16 AM, onsunsl wrote:

Hi Alberto, I'm sorry, I did not describe is clearly. I want to get the highest amplitude of a single word waveform.

Thanks again

-Onsunsl

onsunsl commented 7 years ago

okey, I understand. It seems only to use a third part library. Thank you very much.

-Onsunsl

onsunsl commented 7 years ago

hi Alberto, Can not use the tempfile module, Throw a error when i set text file path. task.text_file_path_absolute = f.name textfile.py file_path() will be open test, and Temporaryfile() to create a temporary file can not use the open() method.

with TemporaryFile('w', encoding='utf-8') as f: f.write(text) f.seek(0)

print(text, wav)

        # create Task object
        config_string = u"task_language=zh|is_text_type=plain|os_task_file_format=aud|tts=custom|tts_path=./mytts.py|task_adjust_boundary_no_zero = True"
        task = Task(config_string=config_string)
        task.audio_file_path_absolute = wav
        task.text_file_path_absolute = f.name

File "C:\Anaconda3\lib\site-packages\aeneas\textfile.py", line 481, in file_path self.log_exc(u"Text file '%s' cannot be read" % (file_path), None, True, OSError) File "C:\Anaconda3\lib\site-packages\aeneas\logger.py", line 351, in log_exc raise raise_type(raise_message) OSError: Text file 'C:\Users\onsun\AppData\Local\Temp\tmpfjky_5j8' cannot be read

pettarin commented 7 years ago

Hi,

from the documentation ( https://docs.python.org/2/library/tempfile.html ):

tempfile.TemporaryFile(...)

 Return a file-like object that can be used as a temporary storage 

area. The file is created using mkstemp(). It will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected). Under Unix, the directory entry for the file is removed immediately after the file is created. Other platforms do not support this; your code should not rely on a temporary file created using this function having or not having a visible name in the file system.

Hence, it is not guaranteed that a file is actually present at the (virtual) path of the temporary file if you create it with TemporaryFile --- that's the problem.

You need to use: tempfile.mkstemp() :

https://docs.python.org/2/library/tempfile.html#tempfile.mkstemp

instead, and manually remove it afterwards.

Alternatively, aeneas.globalfunctions has some utility functions for creating a (persistent) temporary file:

https://www.readbeyond.it/aeneas/docs/globalfunctions.html#aeneas.globalfunctions.tmp_file

and for deleting it:

https://www.readbeyond.it/aeneas/docs/globalfunctions.html#aeneas.globalfunctions.delete_file

Also note that the above two are used all over the other .py files in the aeneas source code, for example:

https://github.com/readbeyond/aeneas/blob/master/aeneas/audiofile.py#L417

and

https://github.com/readbeyond/aeneas/blob/master/aeneas/audiofile.py#L426

HTH,

AP

On 07/28/2017 09:30 AM, onsunsl wrote:

hi Alberto, Can not use the tempfile module, Throw a error when i set text file path. |task.text_file_path_absolute = f.name| textfile.py file_path() will be open test, and Temporaryfile() to create a temporary file can not use the open() method.

` with TemporaryFile('w', encoding='utf-8') as f: f.write(text)

|f.seek(0) # print(text, wav) # create Task object config_string = u"task_language=zh|is_text_type=plain|os_task_file_format=aud|tts=custom|tts_path=./mytts.py|task_adjust_boundary_no_zero = True" task = Task(config_string=config_string) task.audio_file_path_absolute = wav task.text_file_path_absolute = f.name |

`

|File "C:\Anaconda3\lib\site-packages\aeneas\textfile.py", line 481, in file_path self.log_exc(u"Text file '%s' cannot be read" % (file_path), None, True, OSError) File "C:\Anaconda3\lib\site-packages\aeneas\logger.py", line 351, in log_exc raise raise_type(raise_message) OSError: Text file 'C:\Users\onsun\AppData\Local\Temp\tmpfjky_5j8' cannot be read|

onsunsl commented 7 years ago

ok, thank you very much.