Closed vivinastase closed 2 years ago
Hi @vivinastase sorry you're having this issue and thank you for letting us know.
Your description of the source of the error sounds right to me.
Could you please provide a little information just to help us squash the bug?
Hi Dave
Here is the error traceback:
Traceback (most recent call last):
File "/home/vivi/anaconda3/envs/tweetynet/bin/vak", line 8, in
With regards to the dataset, I was using my own. Here is how the problem-causing annotation file looked like when loaded with loadmat:
{'header': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Thu Mar 10 10:24:41 2022', 'version': '1.0', 'globals': [], 'Fs': array([[44100]]), 'fname': array(['R3406_40911.56229478_1_3_15_37_9.wav'], dtype='<U36'), 'onsets': array([[2336.50793651]]), 'offsets': array([[2449.70521542]]), 'num_sylls': array([[1]]), 'labels': array(['a'], dtype='<U1')}
and here is the same file loaded with evfuncs.load_notmat
{'header': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Thu Mar 10 10:24:41 2022', 'version': '1.0', 'globals': [], 'Fs': 44100, 'fname': 'R3406_40911.56229478_1_3_15_37_9.wav', 'onsets': 2336.5079365079364, 'offsets': 2449.705215419501, 'num_sylls': 1, 'labels': 'a'}
After a patch that makes an array from the scalar, this part of the processing went well.
I see, thank you @vivinastase that helps track down the source
I think the right fix might be to add a final if
that catches cases where there's only one annotated segment. I need to double-check but I'm pretty sure there's other code that expects to get 1-d arrays when loading the .not.mat
format, so removing squeeze_me=True
might cause other issues.
Would it be easier for you to work with another format?
You can use a simple .csv file of annotations as described here:
https://vak.readthedocs.io/en/latest/howto/howto_user_annot.html
Please let us know if that how-to guide is not clear, we could revise it.
We could also point to it somewhere else in the vak
docs--maybe the tutorial gave you the impression you needed to use the .not.mat format?
You can also use other formats that crowsetta
can parse, like Praat TextGrid files (not sure how you're annotating your data).
No, removing squeeze_me would cause other problems because of the data shape expected by the column_or_row_or_1d function. I added an if-based patch as you also suggested.
In response to your question about data formatting -- no, I didn't have the impression that I had to use this format. It was just easier because the example data used this format, and it is not a problem to reproduce it. I had tried to use the csv format, but got some errors that were harder to track down (sorry, I removed that version of the dataset, so cannot post a traceback). Maybe having some example data with csv annotations would make it easier for people who want to use this format? It's easier if you just see the file, rather than read explanations, and possibly misinterpret them :)
Maybe having some example data with csv annotations would make it easier for people who want to use this format? It's easier if you just see the file, rather than read explanations, and possibly misinterpret them :)
Yes, agreed, thank you
@vivinastase I am going to close this -- I just opened an issue on the vak
repo so I can track and fix it there.
I included a link to the discussion here. Thank you for your valuable feedback. I don't mean to give you the impression we will not fix this, I just want to make sure I handle it as an issue with vak
and crowsetta
, not tweetynet
itself
When processing files with only one annotation, the system raises an error in module crowsetta/validation.py", line 65, in column_or_row_or_1d. This seems to happen because during loading the annotations file in function notmat2annot in module notmat.py, it uses evfuncs.load_notmat, which itself uses loadmat with option squeeze_me=True which makes a one element list into a scalar.