Open yangzheng-121 opened 2 years ago
Thank you @yangzheng-121 for opening this issue and providing helpful detail.
I'm pasting the image here just to make it easier to reference, hope that's okay
Can you explain where the label ''
comes from?
Is it the silent period between syllables? E.g., between the first and second 'c'?
If yes, then do you manually assign that label or does it occur because Praat gives a default label of "nothing" to that segment?
Are there also ''
labels for silent periods at the beginning and ending of songs? E.g., is the period before the first 'c'
also given a label ''
?
This last point is important to know because of how TweetyNet works! By default it assigns a "background" class to everything you don't label, which includes both silent periods between syllables, silent periods before and after song bouts, and anything else you do not annotate, such as cage noise or calls. I assume what you want TweetyNet to do is only label syllables. What you probably don't want it to do is learn that there's a "different class" for silent gaps between syllables (that you label ''
) and other silent periods (that you might not be labeling?).
This would all be helpful to know since I have not actually used Praat.
[edit]
I see above you wrote
but in the annotation there will always be empty part(which is the silent interval between song or noise)
but I just want to make sure -- you label all silent periods (between syllables and before and after songs) with ''
? Are you doing this manually or is this just a default that Praat gives you? TweetyNet assumes you will label only segments you care about, e.g. syllables, and it assigns a "background" label to all the unlabeled periods between segments. I did try to write vak
so that it would still work if people actually assigned a label to silent periods; we have not really tested this though.
[end edit]
Here's three possible solutions:
labelset
option in the [PREP]
section to what TOML calls an "array" (looks like a Python list), like so:[PREP]
labelset = ['a', 'b', 'c', 'd', 'e', '']
Notice the last label is the empty string.
Of course you want to include any other labels you used.
I am realizing that it's not super well documented that labelset
can be an array.
If you do
>>>help(vak.converters.labelset_to_set)
you can read the docstring that explains how the TOML values get converted to the thing that vak
uses internally. Note again that an "array" in a .toml file gets converted to a Python list when the file is loaded by the toml
library, before this function does anything to it.
''
labels to another label, e.g. 's'
.You could do this with crowsetta
. I will say more about this below.
''
labels, if they are the silent gaps as I'm assuming.
Again you can do this with crowsetta
.For both 2. and 3., I would make back-up copies of your data before trying either, so you don't damage your hand-annotated files, which I'm sure took some time to create.
I can write a small example script of how you would rename or remove the ''
labels with crowsetta
.
If you are willing to share a very small subset of your .TextGrid
files (e.g. by attaching a zip to a reply, or by sending me an email) then I can test the script works for you.
π€ It would actually be much easier to do 2. and 3. above with the current version of crowsetta
, that I haven't published yet.
I will publish some sort of beta version in the next couple of days.
In the meantime you could download the code directly from this repo, cd
into the directory of downloaded code, and then with your conda
environment activated, do pip install .
With the newest version, to remove all the silent period labels, you'd do something like:
import pathlib
import crowsetta
data_dir = pathlib.Path('C:\Users\LabUser\Desktop\DataFolder\song_analysis\tweetynet_bl097\tutee_bl097_tntrain')
textgrid_paths = sorted(data_dir.glob('*.TextGrid')) # I assume the extension is .TextGrid
scribe = crowsetta.Transcriber('textgrid')
for textgrid_path in textgrid_paths:
tg = scribe.from_file(texgrid_path)
intv_tier = tg.textgrid[0]
new_intv_tier = crowsetta._vendor.textgrid.IntervalTier()
for interval in intv_tier:
if interval.mark != '': # skip any interval labeled with `''`
new_intv_tier.addInterval(interval)
tg.textgrid[0] = new_intv_tier
with textgrid_path.open('w') as fp:
tg.textgrid.write(fp)
This is totally untested but plus or minus a couple minor bugs it should work.
I think by default Praat will give all the intervals that I didn't label a '', including the periods between syllables, and periods before and after the song bouts. I didn't give it a '' label, so I also couldn't delete it in Praat.
I also have a question after reading your answer. Am I supposed to label only syllables? Cause I thought that tweetynet also need cage noise or calls when training, so I actually also label noise and calls. If that's not necessary, then I will delete it in my annotation.
I tested the script you provided, and I got error like this: (I do try to debug myself but I find it hard to understand all the script in crowsetta)
AttributeError Traceback (most recent call last)
Input In [5], in <cell line: 4>()
4 for textgrid_path in textgrid_paths:
5 tg = scribe.from_file(textgrid_path)
----> 6 intv_tier = tg.textgrid[0]
7 new_intv_tier = crowsetta._vendor.textgrid.IntervalTier()
8 for interval in intv_tier:
AttributeError: 'Annotation' object has no attribute 'textgrid'
I think by default Praat will give all the intervals that I didn't label a '', including the periods between syllables, and periods before and after the song bouts. I didn't give it a '' label, so I also couldn't delete it in Praat.
Thank you, this is helpful to know.
I also have a question after reading your answer. Am I supposed to label only syllables? Cause I thought that tweetynet also need cage noise or calls when training, so I actually also label noise and calls. If that's not necessary, then I will delete it in my annotation.
In the paper we did not label noise / calls. We have also not tested how much this matters. In theory the network should learn to classify anything you leave unlabeled as "background" and so far we have found this to be the case. If you are concerned about this because you have data with a lot of calls then it may be worth testing by keeping the files where you have annotated those sounds by hand, and seeing whether models trained with just "syllable classes + background" classify those sounds as background.
I tested the script you provided, and I got error like this:
Before running this, did you install crowsetta
using the most recent version of the code downloaded directly from this repository?
I think you must be using an older version, because you got back an Annotation
when you called from_file
, instead of an instance of crowsetta.formats.seq.TextGrid
(which only exists in the newest version that I haven't published yet). You'd need to install direct from source like I said above:
I will publish some sort of beta version in the next couple of days. In the meantime you could download the code directly from this repo,
cd
into the directory of downloaded code, and then with your conda environment activated, dopip install .
@yangzheng-121 did you try solution 1. above?
That might be a quick and easy fix.
Or at least it's quick to find out if it doesn't work π
I'm not actually sure if this would work of the top of my head, but you could try setting the value for the labelset option in the [PREP] section to what TOML calls an "array" (looks like a Python list), like so:
[PREP]
labelset = ['a', 'b', 'c', 'd', 'e', '']
Let me know if it's not clear what I'm suggesting here
A few feedback. I tried solution 1. It seemed to work at first. I went through the first 3 steps, but at the last step of predicting annotation, I got an error like this:
(tweetynet) C:\Users\LabUser>vak predict C:\Users\LabUser\Desktop\DataFolder\song_analysis\gy6or6_predict.toml
Logging results to C:\Users\LabUser\Desktop\DataFolder\song_analysis\tweetynet_bl097\predict
loading SpectScaler from path: C:\Users\LabUser\Desktop\DataFolder\song_analysis\tweetynet_bl097\train\results_220523_235928\StandardizeSpect
loading labelmap from path: C:\Users\LabUser\Desktop\DataFolder\song_analysis\tweetynet_bl097\train\results_220523_235928\labelmap.json
loading dataset to predict from csv path: C:\Users\LabUser\Desktop\DataFolder\song_analysis\tweetynet_bl097\predict\tutee_bl097_tnpredict_prep_220524_134910.csv
will save annotations in .csv file: C:\Users\LabUser\Desktop\DataFolder\song_analysis\tweetynet_bl097\predict\gy6or6.bl097.annot.csv
dataset has timebins with duration: 0.00145
shape of input to networks used for predictions: torch.Size([1, 257, 88])
instantiating models from model-config map:/n{'TweetyNet': {'optimizer': {'lr': 0.001}, 'network': {}, 'loss': {}, 'metrics': {}}}
loading checkpoint for TweetyNet from path: C:\Users\LabUser\Desktop\DataFolder\song_analysis\tweetynet_bl097\train\results_220523_235928\TweetyNet\checkpoints\max-val-acc-checkpoint.pt
Loading checkpoint from:
C:\Users\LabUser\Desktop\DataFolder\song_analysis\tweetynet_bl097\train\results_220523_235928\TweetyNet\checkpoints\max-val-acc-checkpoint.pt
running predict method of TweetyNet
batch 243 / 244: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 244/244 [02:55<00:00, 1.39it/s]
0%| | 0/244 [00:00<?, ?it/s]converting predictions to annotations
0%| | 0/244 [00:23<?, ?it/s]
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\tweetynet\Scripts\vak-script.py", line 9, in <module>
sys.exit(main())
File "C:\ProgramData\Anaconda3\envs\tweetynet\lib\site-packages\vak\__main__.py", line 45, in main
cli.cli(command=args.command, config_file=args.configfile)
File "C:\ProgramData\Anaconda3\envs\tweetynet\lib\site-packages\vak\cli\cli.py", line 30, in cli
COMMAND_FUNCTION_MAP[command](toml_path=config_file)
File "C:\ProgramData\Anaconda3\envs\tweetynet\lib\site-packages\vak\cli\predict.py", line 42, in predict
core.predict(
File "C:\ProgramData\Anaconda3\envs\tweetynet\lib\site-packages\vak\core\predict.py", line 261, in predict
labels, onsets_s, offsets_s = labeled_timebins.lbl_tb2segments(
File "C:\ProgramData\Anaconda3\envs\tweetynet\lib\site-packages\vak\labeled_timebins.py", line 423, in lbl_tb2segments
raise ValueError(
ValueError: min_segment_dur or majority_vote specified, but 'unlabeled' not in labelmap.
Without 'unlabeled' segments these transforms cannot be applied.
I then turned to solution 2&3, but I think I probably didn't understand it correctly. I downloaded the code from this page: https://github.com/vocalpy/crowsetta, and then I cd into the directory of the downloaded code. I activated my conda environment for tweetynet, and then I did pip install crowsetta. The command window gave me some feedback like "requirement already satisfied" To be more specific:
(tweetynet) C:\Users\LabUser\Desktop\DataFolder\song_analysis\crowsetta-main>pip install crowsetta
Requirement already satisfied: crowsetta in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (3.4.1)
Requirement already satisfied: pandas>=1.3.5 in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (from crowsetta) (1.4.2)
Requirement already satisfied: numpy>=1.18.1 in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (from crowsetta) (1.22.3)
Requirement already satisfied: evfuncs>=0.3.5 in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (from crowsetta) (0.3.5)
Requirement already satisfied: scipy>=1.4.1 in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (from crowsetta) (1.8.1)
Requirement already satisfied: attrs>=19.3.0 in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (from crowsetta) (21.4.0)
Requirement already satisfied: SoundFile>=0.10.3 in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (from crowsetta) (0.10.3.post1)
Requirement already satisfied: birdsong-recognition-dataset>=0.3.2 in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (from crowsetta) (0.3.2)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (from pandas>=1.3.5->crowsetta) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (from pandas>=1.3.5->crowsetta) (2022.1)
Requirement already satisfied: cffi>=1.0 in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (from SoundFile>=0.10.3->crowsetta) (1.15.0)
Requirement already satisfied: pycparser in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (from cffi>=1.0->SoundFile>=0.10.3->crowsetta) (2.21)
Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\envs\tweetynet\lib\site-packages (from python-dateutil>=2.8.1->pandas>=1.3.5->crowsetta) (1.16.0)
ValueError: min_segment_dur or majority_vote specified, but 'unlabeled' not in labelmap. Without 'unlabeled' segments these transforms cannot be applied.
Ah right, I'm sorry, I forgot that those post-processing steps do require the unlabeled "background" segments to be present.
You should be able to run predict
if you remove the two options that specify those steps from your .toml config:
majority_vote = true
min_segment_dur = 0.01
(The default for majority_vote
is False
and for min_segment_dur
it's None
so predict
won't try to apply those clean-ups: https://vak.readthedocs.io/en/latest/reference/config.html#predict-section)
But my guess is that without the clean-up steps, the predictions will be noisier than you want.
So we probably want to find a way to remove those labels for the silent gaps,
and then I did pip install crowsetta
Ah, my fault for not explaining more clearly.
You need to tell pip
to install from a path. If you just write the package name, it tries to install from PyPI as you may know.
Notice the period I wrote:
pip install .
where the period .
means "this location right here"..
You probably missed the period.
Equivalently, if you were in the parent directory of where you downloaded the code, I think you could write a relative path:
pip install .\crowsetta\
or you could write the absolute path:
pip install C:\Users\LabUser\Desktop\repos\crowsetta
(replace what I wrote with the actual path, obviously)
Asked on twitter to see if other people using Praat have run into this issue:
https://twitter.com/nicholdav/status/1529436970983862272
I know there are .TextGrid datasets out there without these "empty string labels".
For example: https://osf.io/r6paq/
and: https://figshare.com/articles/dataset/Data_used_in_PLoS_One_article_Complexity_Predictability_and_Time_Homogeneity_of_Syntax_in_the_Songs_of_Cassin_s_Vireo_Vireo_cassini_by_Hedley_2016_/3081814
Not sure what your method is now--I guess you are finding all the segments "by eye"?--but one option would be to:
crowsetta
You'd want to write a script to do this.
Hi, David, it's me again:) I want to ask about something just to make it clear. So it seems that Praat indeed have empty label for the interval tiers by default, I checked with the dataset you listed above, and they also have empty labels when open with Praat(picture attached).
Ah right, I'm sorry, I forgot that those post-processing steps do require the unlabeled "background" segments to be present.
So, if I understand it correctly, there need to be unlabel background for the tweetynet to run, then labeling all the silence gaps will not work. Then, does it work if I use crowsetta to remove the labels? Cause I am not sure by 'remove', is it going to remove only the labels, or is it going to remove the silence gaps with '' labels? If it's the latter case, then the silence part will be gone and it will not work.
Hi @yangzheng-121 thank you for following up -- I had meant to get back to you.
We would really like to make sure vak
works with .TextGrid files.
So it seems that Praat indeed have empty label for the interval tiers by default
Agreed this is the default, at least as far as I can tell by loading public datasets using crowsetta
.
Seems like I actually need to work with Praat so I understand why this is the default for the app.
there need to be unlabel background for the tweetynet to run
Not quite.
You should be able to run vak predict
with your current data, where silent gaps are labeled with the empty string, by removing the options for the post-processing steps.
Take the two options out of your file that I show here and see if it runs then:
https://github.com/vocalpy/vak/issues/511#issuecomment-1137157830
Please let me know how that goes.
I expect that it will work, but you will have noisy segmentation because it does not do the post-processing that cleans up.
Then, does it work if I use crowsetta to remove the labels?
If I were you, this is what I would do.
Can you please share a small sample dataset with me by email, e.g. in a Google Drive folder?
Just use the email address that Sita cc'd you on before (when writing to the other Dave :slightly_smiling_face:)
I'll write an example script for you that removes all the empty labels without affecting the other labels.
I'll post it here too so we can refer people to this issue until we figure out a longer term solution for working with .TextGrid annotations. (It's definitely not convenient to need to remove labels from your annotation all the time)
Hi David, Sita here. Thanks again for helping us out! I found out that praat can convert textgrids to csv so we can use the simple-seq format. I wrote a praatscript to convert multiple ones. now the column order is onset tier label offset, but I think this can be edited with the 'from-file' argument in simple-seq if I'm correct? praatscript can be found here: https://github.com/sthaar/praatscripts haven't tried it yet in vak/crowsetta ('cause I'm still running into crowsetta version issues). will let you know if it does!
Maybe you could implement it in tweetynet permanently with parselmouth? https://parselmouth.readthedocs.io/en/stable/ I haven't used that either yet, but looks promising!
Hi @sthaar
I found out that praat can convert textgrids to csv
Great, this is good to know.
When you convert to .csv, do you still get the segments with labels that are empty strings, as @yangzheng-121 described above?
https://github.com/vocalpy/vak/issues/511#issuecomment-1156540718
I think this is the main issue we want to fix here.
It seems to be the case that Praat requires all possible Intervals in a Tier to have a string label, and this is what results in the unlabeled Intervals getting assigned a label of "empty string". As @drfeinberg pointed out on twitter https://twitter.com/davidrfeinberg/status/1537841131127558145
Yeah. I agree I believe it needs a label. I think it's just down to the way it's programmed that each interval has to have a label --even if it's empty. Here's the source code for textgrid: https://github.com/praat/praat/blob/master/fon/TextGrid.cpp
You can definitely try training a model using data with the silent gaps labeled.
But in that case when you run vak predict
you won't be able to apply post-processing steps that require there to be a "background" class that includes the silent gaps between vocalizations, as well as any other un-annotated periods.
You should be able to run predict anyway, if you just remove the options from the config file that do the post-processing: https://github.com/vocalpy/vak/issues/511#issuecomment-1137157830 but as I said in that comment my guess is that the error rate will be higher than you want
but I think this can be edited with the 'from-file' argument in simple-seq
You are right that the newest version of crowsetta
does have an argument to the from_file
method that lets you specify a mapping from one set of column names to another.
But that version isn't published yet and it's not yet built into vak
, so that won't fix things for you right now.
Sorry for confusing you by having multiple versions that co-exist :grimacing: -- this is the fun of open-source development.
You will want to write a script to convert your .csv files to the simple-seq format as described here: https://vak.readthedocs.io/en/latest/howto/howto_user_annot.html#example-script-for-converting-txt-files-to-the-simple-seq-format
I am more than happy to do this for you and will probably add a "how-to" to the crowsetta
docs with this as a demo anyway.
praatscript can be found here: https://github.com/sthaar/praatscripts
Thank you for sharing this, I see the script here I think?
https://github.com/sthaar/praatscripts/blob/main/textgrids_to_csv_fortweetynet.praat
Yes I'm pretty sure you'll want to write a very short Python script to convert these to the current simple-seq format, like the one I linked to above.
Then you should be able to say annot_format = "simple-seq"
in the config file and have it just work.
But again you want to remove any labels on silent gaps, so that you can apply the clean up steps when you predict new labels.
Maybe you could implement it in tweetynet permanently with parselmouth? https://parselmouth.readthedocs.io/en/stable/
Thank you, I am aware of parselmouth
but if we add it as a dependency we'll be bringing in all of the Praat code + a Python wrapper around it just to load annotations, which we can already do, and we'd still have the same issue that you end up with these silent gaps that are assigned empty labels, that you need to deal with somehow
I hope this reply doesn't come off as grumpy or pedantic, just trying to make sure I'm clear
Let's definitely meet soon and talk some off this out because I think we are nearing the limits of communication via GitHub issue :smile: and I do want to help you
Hi sorry for the late reply, in meetings all day, No worries, I'm just very happy you take the time and effort to explain all this!
When you convert to .csv, do you still get the segments with labels that are empty strings, as @yangzheng-121 described above?
sorry I wasn't clear about this, it's without labels for empty segments! You can choose to exclude empty intervals in praat.
In my praatscript it's line 16 Down to Table: "no", 6, "yes", "no"
(last no)
The link is indeed https://github.com/sthaar/praatscripts/blob/main/textgrids_to_csv_fortweetynet.praat
The output of the script is now a file with .wav.csv extension, with column names 'onset_s', 'labels', and 'offset_s'
with a gap between the two xs labels
So if I'm correct this is already the simple-seq format? At least, vak train
runs without error when I say annot_format = "simple-seq" in the config file (train toml) file and use these .wav.csv files produced by the praatscript. But I haven't tried on an actual dataset (just test with 2 files) and I haven't tried predict yet.
I'll try tomorrow and read more in the manual about the formats.
Sorry for confusing you by having multiple versions that co-exist grimacing -- this is the fun of open-source development.
No worries of course! again happy you're taking the effort. I'm aware of the cowsetta version transition so I'll just wait for the new version :) let me know if we can help out with reading/testing!
And I'll email about a zoom meeting indeed! Many, thanks
Hi, David, I was trying to use vak with textgrid files, I changed these 2 lines in the gy6or6_train.toml file: audio_format = "wav" annot_format = "textgrid"
I ran it with the command: vak prep C:\Users\LabUser\Desktop\DataFolder\song_analysis\gy6or6_train.toml
This is the error I got:
As far as I understand, this means I have an empty lable which cannot be recognized, but in the annotation there will always be empty part(which is the silent interval between song or noise). I can upload one example of my annotation. It's a picture not the original file cause the file type is not supported. Doc1.docx