Closed onsunsl closed 6 years ago
Hi,
I am sorry but I do not understand your request at all; please try to formulate it with different words, maybe with an example, thank you.
In particular, what do you mean with "highest volume"? The word which is spoken "louder"? Or do you mean something else?
AP
On 07/26/2017 05:05 AM, onsunsl wrote:
@readbeyond https://github.com/readbeyond Hi Alberto, I wan to get the highest volume and position of the word that alignment at word level. What should i do?
Hi Alberto,
I'm sorry, I did not describe it clearly. I want to get the highest amplitude and time point in a single word audio data.
Thanks again
-Onsunsl
Another question: In the library tutorial task input parameters are text files, how to change the parameters to the string type? for example:
# create Task object
task = Task()
task.audio_file_path_absolute = u"./audio.wav"
task.text = u"hello world!"
Thanks
See the second example ("Create a TextFile programmatically, and assign it to Task") of:
https://www.readbeyond.it/aeneas/docs/libtutorial.html#concepts
Alternatively, you can write your contents to a temporary file (tempfile package in Python), pass its path as in the documentation examples, and then delete the temporary file after the alignment has been computed.
Note that conceptually aeneas expects a list of fragments (strings), not a string --- an input text format ("plain", "subtitles", etc.) implicitly defines a way to parse the contents of a file into a list of fragments. In fact "hello world!" alone is ambiguous: is it a single fragment or a list of two fragments ("hello", "world")? Etc.
HTH,
AP
On 07/27/2017 05:40 AM, onsunsl wrote:
Another question: In the library tutorial task input parameters are text files, how to change the parameters to the string type? for example:
create Task object
task = Task() task.audio_file_path_absolute = u"./audio.wav" task.text = u"hello world!"
Thanks
@readbeyond
hi, Also the first question, how to get loudness and time of the single word. thanks again.
-Onunsl
(I sent this already, but apparently it did not get through GitHub, apparently.)
First of all, in any case there is nothing like that in aeneas command line tools, you need to use it as a library in your code.
I think you need to define better what "highest amplitude" means for a word.
If you just need the maximum amplitude of the audio signal, then you can simply take the max over the array representing the (mono) signal. No need to use aeneas at all, you can do that with wave and numpy.
The fact is that this way you get the maximum amplitude of a single audio sample --- which is probably not what you want. For example, if your audio file is quite saturated, you will have many audio samples that have value max(2 bytes signed int) = 32767, and those do not necessarily indicate where the "loudest word" is.
You probably want to define something like:
(1) amplitude(word) = average(A_1, A_2, ... A_N)
where A_1, A_2, ..., A_N are the amplitudes of the audio samples of word, i.e. between onset(word) and offset(word), or --- probably even better:
(2) amplitude(word) = count(A_i > 3 * AVG, for i=1..N) / N
where AVG = average(absolute(A_i)) with the average is taken over the all samples in the audio file.
This second metric (2) basically measures what % of audio samples in a word are above the "average (absolute) level" of the audio file. Note you need the absolute value, because an audio file usually has zero or near-zero average, since the signal oscillates roughly symmetrically between positive and negative amplitudes.
Finally, another approach is
(3) amplitude(word) = average(MFCC_0[i]) / average_all_audio(MFCC_0[j]),
where MFCC_0[i] is the first MFCC coefficient of frame i, and you take onset(word) <= i < offset(word), while the second average is done over all the audio signal. Metric (3) say how "energetic" a word is, w.r.t. the average energy of the entire audio.
Clearly in all three cases, you finally take the max over all words.
You can use aeneas to get the onset and offset times for each word as you already know. Then, you can inspect AudioFile.audio_samples (which is a numpy array) for computing (1) and (2), or AudioFileMFCC.all_mfcc (or AudioFileMFCC.middle_mfcc if you iterate over each word, setting head_length and middle_length to the onset and duration of each word).
In general, aeneas is not designed to do signal processing/analysis. Other libraries (e.g. librosa or even directly numpy) might have better tools for that. You probably still have to decide the onset/offset of each word, someway, though.
HTH,
Alberto Pettarin
On 07/27/2017 04:16 AM, onsunsl wrote:
Hi Alberto, I'm sorry, I did not describe is clearly. I want to get the highest amplitude of a single word waveform.
Thanks again
-Onsunsl
okey, I understand. It seems only to use a third part library. Thank you very much.
-Onsunsl
hi Alberto,
Can not use the tempfile module, Throw a error when i set text file path.
task.text_file_path_absolute = f.name
textfile.py file_path() will be open test, and Temporaryfile() to create a temporary file can not use the open() method.
with TemporaryFile('w', encoding='utf-8') as f: f.write(text) f.seek(0)
# create Task object
config_string = u"task_language=zh|is_text_type=plain|os_task_file_format=aud|tts=custom|tts_path=./mytts.py|task_adjust_boundary_no_zero = True"
task = Task(config_string=config_string)
task.audio_file_path_absolute = wav
task.text_file_path_absolute = f.name
File "C:\Anaconda3\lib\site-packages\aeneas\textfile.py", line 481, in file_path self.log_exc(u"Text file '%s' cannot be read" % (file_path), None, True, OSError) File "C:\Anaconda3\lib\site-packages\aeneas\logger.py", line 351, in log_exc raise raise_type(raise_message) OSError: Text file 'C:\Users\onsun\AppData\Local\Temp\tmpfjky_5j8' cannot be read
Hi,
from the documentation ( https://docs.python.org/2/library/tempfile.html ):
tempfile.TemporaryFile(...)
Return a file-like object that can be used as a temporary storage
area. The file is created using mkstemp(). It will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected). Under Unix, the directory entry for the file is removed immediately after the file is created. Other platforms do not support this; your code should not rely on a temporary file created using this function having or not having a visible name in the file system.
Hence, it is not guaranteed that a file is actually present at the (virtual) path of the temporary file if you create it with TemporaryFile --- that's the problem.
You need to use: tempfile.mkstemp() :
https://docs.python.org/2/library/tempfile.html#tempfile.mkstemp
instead, and manually remove it afterwards.
Alternatively, aeneas.globalfunctions has some utility functions for creating a (persistent) temporary file:
https://www.readbeyond.it/aeneas/docs/globalfunctions.html#aeneas.globalfunctions.tmp_file
and for deleting it:
https://www.readbeyond.it/aeneas/docs/globalfunctions.html#aeneas.globalfunctions.delete_file
Also note that the above two are used all over the other .py files in the aeneas source code, for example:
https://github.com/readbeyond/aeneas/blob/master/aeneas/audiofile.py#L417
and
https://github.com/readbeyond/aeneas/blob/master/aeneas/audiofile.py#L426
HTH,
AP
On 07/28/2017 09:30 AM, onsunsl wrote:
hi Alberto, Can not use the tempfile module, Throw a error when i set text file path. |task.text_file_path_absolute = f.name| textfile.py file_path() will be open test, and Temporaryfile() to create a temporary file can not use the open() method.
` with TemporaryFile('w', encoding='utf-8') as f: f.write(text)
|f.seek(0) # print(text, wav) # create Task object config_string = u"task_language=zh|is_text_type=plain|os_task_file_format=aud|tts=custom|tts_path=./mytts.py|task_adjust_boundary_no_zero = True" task = Task(config_string=config_string) task.audio_file_path_absolute = wav task.text_file_path_absolute = f.name |
`
|File "C:\Anaconda3\lib\site-packages\aeneas\textfile.py", line 481, in file_path self.log_exc(u"Text file '%s' cannot be read" % (file_path), None, True, OSError) File "C:\Anaconda3\lib\site-packages\aeneas\logger.py", line 351, in log_exc raise raise_type(raise_message) OSError: Text file 'C:\Users\onsun\AppData\Local\Temp\tmpfjky_5j8' cannot be read|
ok, thank you very much.
@readbeyond Hi Alberto, I wan to get the highest volume and position of the word that alignment at word level. What should i do?
thanks a lot.
-Onunsl