vocalpy / crowsetta

A tool to work with any format for annotating vocalizations
https://crowsetta.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
49 stars 3 forks source link

module 'crowsetta' has no attribute 'formats' #179

Closed yangzheng-121 closed 2 years ago

yangzheng-121 commented 2 years ago

Hi, David, I was following the tutorial on the website: https://crowsetta.readthedocs.io/en/latest/tutorial.html#tutorial, and try to learn to use the crowsetta to covert my own annotation files(TextGrid) to csv files. I met some problems however.

Here is the error I got:

import crowsetta crowsetta.formats()

AttributeError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_10668\2199636591.py in ----> 1 crowsetta.formats()

AttributeError: module 'crowsetta' has no attribute 'formats'

I already downloaded crowsetta, and I can find all the scripts, including 'formats' under the 'crowsetta' directory. I would appreciate very much if you could help me to solve this error. btw, when I tried to run "crowsetta.data.fetch(format='notmat', destination_path='./data/')", I also got an error of "module 'crowsetta' has no attribute 'data'".

NickleDave commented 2 years ago

Hi @yangzheng-121 happy to help and thank you for taking the time to read the documentation

Can you tell me a little more about what you're trying to do?
If you would like to use .TextGrid files with vak, you may not need to convert them to .csv files; you should be able to write

format = 'textgrid'

in the [PREP] section of your .toml config file and have it work correctly.
I implemented this but have not extensively tested vak on a dataset of .TextGrid files; if you try it and get an error, please feel free to report that by raising an issue on the vak repository!

Long story short: I'm actually in the middle of a total overhaul of this library -- as you can see in #146 -- which includes re-writing the docs (in this branch https://github.com/vocalpy/crowsetta/tree/rewrite-docs-fixes-%23152) so they are up-to-date.

There used to be a crowsetta.formats.show() function that would list all of the installed formats--I think that's what the page you linked is referring to (obviously I really need to update the docs :grimacing: ).

That will be replaced by crowsetta.formats.as_list() in the new version. If you directly downloaded the latest source code of off GitHub then you would have this new version. If you instead pip install crowsetta==3.4.1 into another environment you should be able to call crowsetta.formats.show(). (I can see that it's still there in the 3.4.1 version: https://github.com/vocalpy/crowsetta/blob/maintenance/3.4.x/src/crowsetta/formats.py)

I know this is a long reply--just trying to make sure I don't confuse you with the code that's under construction.

Happy to help if you tell me a little bit more about what you're trying to do, or if you have a specific issue you ran into when converting to .csv files

NickleDave commented 2 years ago

This might be a shorter answer to your question:
if you want to convert your .TextGrid files to a .csv in the 'csv' format as referenced in the vak docs, you can do something like the following with crowsetta version 3.4.1.

import pathlib

import crowsetta

tg_root = pathlib.Path('~/path/to/dir/where/you/have/textgrid/files')  # or 'C:/Users/You/path/`... on Windows
tg_paths = sorted(tg_root.glob('*.TextGrid'))  # assumes all your filenames end with .TextGrid extension

scribe_tg = crowsetta.Transcriber(format='textgrid')
annots = []
for tg_path in tg_paths:
    annots.append(scribe_tg.from_file(tg_path))

scribe_csv = crowsetta.Transcriber(format='csv')
scribe_csv.to_csv(annots, csv_filename='~/path/to/where/you/save/annotations.csv')

Does that look like what you have in mind?
I haven't tested it but I think it should work.
Feel free to reply with a traceback if you try and get an error.

Don't mean to overwhelm you with options. There's always too many ways to do things in programming :slightly_smiling_face:

yangzheng-121 commented 2 years ago

Hi, David! Thanks for your quick and patient answer very much! Sorry for my late reply, I am still learning programming, so it take some time to figure things out.

Here is some feedback: I did try the textgrid file with vak. I also met some problems, but I am not sure if it's because of my annotation or error in the vak. I'll figure that out later.

In the mean time, I also tried the script you offered. Here is the error I got:


AttributeError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_12184\657776347.py in 2 tg_paths = sorted(tg_root.glob('*.TextGrid')) # assumes all your filenames end with .TextGrid extension 3 ----> 4 scribe_tg = crowsetta.Transcriber(format='textgrid') 5 annots = [] 6 for tg_path in tg_paths:

AttributeError: module 'crowsetta' has no attribute 'Transcriber'

When I checked the site-packages/crowsetta folder, I think I have all the attributes I need: annotation, birdsongrec, csv, formats, generic, meta, notmat, phn, segement, sequence, simple, stack, textgrid, transcriber, validation, yarden. but when I used print(dir(crowsetta)) to see what I've imported, I got this: ['doc', 'file', 'loader', 'name', 'package', 'path', 'spec']

NickleDave commented 2 years ago

Sorry for my late reply, I am still learning programming, so it take some time to figure things out.

No worries, I'm happy to help and I'm not going anywhere 🙂

Here is some feedback: I did try the textgrid file with vak. I also met some problems, but I am not sure if it's because of my annotation or error in the vak. I'll figure that out later.

That sounds like it might be a bug in vak, if you could raise an issue with a detailed description of the error you got, I would really appreciate it 🙏

When I checked the site-packages/crowsetta folder, I think I have all the attributes I need: annotation, birdsongrec, csv, formats, generic, meta, notmat, phn, segement, sequence, simple, stack, textgrid, transcriber, validation, yarden. but when I used print(dir(crowsetta)) to see what I've imported, I got this: ['doc', 'file', 'loader', 'name', 'package', 'path', 'spec']

This sounds like an installation issue--I see this sometimes in Python when things don't get installed right.
How did you install crowsetta? Please reply with the exact commands you typed in if possible. Above you said

I already downloaded crowsetta

which makes me think you clicked "Download ZIP" on the main page of this repository?
Did you then do pip install . in the root of the repository in your environment?
(if what I just asked doesn't make sense to you, no worries, but let me know how you installed and I can try to walk you through fixing it or installing a different way)

yangzheng-121 commented 2 years ago

which makes me think you clicked "Download ZIP" on the main page of this repository? Did you then do pip install . in the root of the repository in your environment? (if what I just asked doesn't make sense to you, no worries, but let me know how you installed and I can try to walk you through fixing it or installing a different way)

I installed it several days ago so I am not sure if I could provide all the details. I first created a new environment in anaconda for tweetynet, then I think I install tweetynet, vak, and crowsetta using the 'pip install' command.

NickleDave commented 2 years ago

I see, thank you.

I should have just asked -- can you please share that environment> By doing the following:

  1. create two files by executing the commands below (replace tweety-env with whatever you actually called your environment. Doesn't matter what directory you execute these commands in, as long you have the right environment activated)
    C:\Users\LabUser\Desktop\DataFolder> conda activate tweety-env
    (tweety-env) C:\Users\LabUser\Desktop\DataFolder> conda env export > environment.yml
    (tweety-env) C:\Users\LabUser\Desktop\DataFolder> conda list --explicit > spec-file.txt
  2. then attach them as a .zip in a reply to this comment (because annoyingly GitHub won't let you just attach the files directly)
NickleDave commented 2 years ago

Could you also try creating a new environment where you install everything as below, and then test and see if you have the same issue with crowsetta in that new environment?
(don't remove the old one yet in case we need to compare them please)

(You can use a different name besides vak-env like vak-env-new-test or something if you already have a vak-env environment)

conda create --name vak-env
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
conda install vak tweetynet -c conda-forge

I'm assuming you're on a machine with a GPU.
If not then replace the second line above with
conda install pytorch torchvision cpuonly -c pytorch.

yangzheng-121 commented 2 years ago

Hi, David, it happened to me that when I tried to run vak prep again in the environment I created for tweetynet, it gave me the error: Failed to load PyTorch C extensions What I've done is use the command "conda install tweetynet -c conda-forge -c pytorch" to add this additional channel. Then when I runned the covert_to_csv code again, it worked. So it seems it's just my silly mistake to forget to add the pytorch. But with the generic-seq file I still could not get rid of the empty label problem, I will update it on the issue I raised in 'vak'. I actually don't quite understand why even after I changed the format, it still give me the empty label error. I hoped it's not because I use the code in a wrong way. I will put the code here if you don't mind taking a look at it.

import pathlib
import crowsetta

tg_root = pathlib.Path(r'C:\Users\LabUser\Desktop\DataFolder\song_analysis\tweetynet_bl097\tutee_bl097_tntrain')  # or 'C:/Users/You/path/`... on Windows
tg_paths = sorted(tg_root.glob('*.TextGrid'))  # assumes all your filenames end with .TextGrid extension

scribe_tg = crowsetta.Transcriber(format='textgrid')
annots = []
for tg_path in tg_paths:
    annots.append(scribe_tg.from_file(tg_path))

scribe_csv = crowsetta.Transcriber(format='generic-seq')
scribe_csv.to_csv(annots, csv_filename=r'C:\Users\LabUser\Desktop\DataFolder\song_analysis\tweetynet_bl097\tutee_bl097_tntrain.generic-seq')
[PREP]
data_dir = "C:/Users/LabUser/Desktop/DataFolder/song_analysis/tweetynet_bl097/tutee_bl097_tntrain"
output_dir = "C:/Users/LabUser/Desktop/DataFolder/song_analysis/tweetynet_bl097/train"
audio_format = "wav"
annot_format = "generic-seq"
annot_file = "C:/Users/LabUser/Desktop/DataFolder/song_analysis/tweetynet_bl097/tutee_bl097_tntrain.generic-seq"
labelset = "iabcdefghjk"
train_dur = 50
val_dur = 15
NickleDave commented 2 years ago

Going to close this for now @yangzheng-121 since we are dealing with what I think is the core issue on https://github.com/vocalpy/vak/issues/511