vak prep crashes because of annotation file encoding

vocalpy / vak

A neural network framework for researchers studying acoustic communication

https://vak.readthedocs.io

BSD 3-Clause "New" or "Revised" License

78 stars 16 forks source link

vak prep crashes because of annotation file encoding #382

Closed marichard123 closed 2 years ago

marichard123 commented 3 years ago

When running the prep stage, at the point after which I believe the spectrograms are created, I get the following error message (I have attached the full error traceback at the end of the message):

File "c:\users\richard\anaconda3\envs\vak-env\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 164: character maps to

Initially I thought it was a problem of Python not being set to the correct encoding standard, so inside the cp1252.py file I tried manually setting the encoding procedure to ANSI and UTF-8, which the created text files were created in, with no success. I then noticed that in the second half of the file there was a decoding table, a tuple in which the encoding codes were all manually listed. Screenshot (106) Among them were codes mapping to "undefined". Byte 0x90 indeed maps to 'undefined'. In other words, it seems to me that rather than a case of 0x90 not being defined in whatever encoding procedure Python is using due to encoding mismatch, that 0x90 is hard-coded to map to "undefined", and that the problem lies within whatever file the program is reading from. I'm not sure how to identify the file that's causing the problem/how to pinpoint what exactly is causing the 0x90 to appear in it. Have you run into a similar error during development/would you have any insight into the nature of the problem?

Full error traceback:

(vak-env) PS C:\Users\Richard\Documents\Fall_2021\Bat_Stuff\TweetynetPipeline> vak prep gy6or6_train.toml
determined that purpose of config file is: train
will add 'csv_path' option to 'TRAIN' section
purpose for dataset: train
will split dataset
making array files containing spectrograms from audio files in: C:\Users\Richard\Documents\Fall_2021\Bat_Stuff\TweetynetPipeline
creating array files with spectrograms
[########################################] | 100% Completed | 10.5s
creating dataset from spectrogram files in: C:\Users\Richard\Documents\Fall_2021\Bat_Stuff\TweetynetPipeline\spectrograms_generated_211108_024036
validating set of spectrogram files
[########################################] | 100% Completed |  9.8s
creating pandas.DataFrame representing dataset from spectrogram files
[########################################] | 100% Completed | 10.4s
Traceback (most recent call last):
  File "c:\users\richard\anaconda3\envs\vak-env\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\richard\anaconda3\envs\vak-env\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Richard\Anaconda3\envs\vak-env\Scripts\vak.exe\__main__.py", line 7, in <module>
  File "c:\users\richard\anaconda3\envs\vak-env\lib\site-packages\vak\__main__.py", line 45, in main
    cli.cli(command=args.command, config_file=args.configfile)
  File "c:\users\richard\anaconda3\envs\vak-env\lib\site-packages\vak\cli\cli.py", line 30, in cli
    COMMAND_FUNCTION_MAP[command](toml_path=config_file)
  File "c:\users\richard\anaconda3\envs\vak-env\lib\site-packages\vak\cli\prep.py", line 146, in prep
    logger=logger,
  File "c:\users\richard\anaconda3\envs\vak-env\lib\site-packages\vak\core\prep.py", line 226, in prep
    logger=logger,
  File "c:\users\richard\anaconda3\envs\vak-env\lib\site-packages\vak\split\split.py", line 138, in dataframe
    labels = labels_from_df(vak_df)
  File "c:\users\richard\anaconda3\envs\vak-env\lib\site-packages\vak\labels.py", line 79, in from_df
    annots = annotation.from_df(vak_df)
  File "c:\users\richard\anaconda3\envs\vak-env\lib\site-packages\vak\annotation.py", line 107, in from_df
    scribe.from_file(annot_path) for annot_path in vak_df["annot_path"].values
  File "c:\users\richard\anaconda3\envs\vak-env\lib\site-packages\vak\annotation.py", line 107, in <listcomp>
    scribe.from_file(annot_path) for annot_path in vak_df["annot_path"].values
  File "c:\users\richard\anaconda3\envs\vak-env\lib\site-packages\crowsetta\csv.py", line 220, in csv2annot
    set_header = set(reader.fieldnames)
  File "c:\users\richard\anaconda3\envs\vak-env\lib\csv.py", line 98, in fieldnames
    self._fieldnames = next(self.reader)
  File "c:\users\richard\anaconda3\envs\vak-env\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 164: character maps to <undefined>

NickleDave commented 3 years ago

Hi @marichard123 thank you for raising a clear detailed issue -- sorry you're having this problem.

We are very excited to work with a computer scientist that would actually think to look at the encoding used 😁 but I think your hunch is right, that a good place to start is hunting down the offending file.

Before doing that: are you able to provide a little bit more information about your annotation file(s)? Is there a single .csv file with all the annotations for every audio file, or is it one annotation file per audio file?

I'm wondering if a quick fix is to simply save the original file(s) in a different encoding. If you can tell me more about how you generated them, that might help us figure it out.

I can see that the crash occurred when crowsetta tried to open one of them.

  File "c:\users\richard\anaconda3\envs\vak-env\lib\site-packages\crowsetta\csv.py", line 220, in csv2annot
    set_header = set(reader.fieldnames)
  File "c:\users\richard\anaconda3\envs\vak-env\lib\csv.py", line 98, in fieldnames
    self._fieldnames = next(self.reader)
  File "c:\users\richard\anaconda3\envs\vak-env\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]

If you are okay with sharing annotation files either here or by email, I am also happy to run them through crowsetta to see if I can diagnose what's going on.

It might be the case that we need to do a little work on the crowsetta code to make csv loading more general -- I'd really appreciate if you can help me figure that out before we have a ton of users testing things and getting angry about encoding errors 😅

It's probably better to use something than our own hand-rolled csv read/write code anyway (as discussed in this issue)

As far as finding the file: A quick way to troubleshoot might be making a dataset with a single audio file and see if it still crashes.

It looks like the crash happened right at the start of checking annotations, so it might not be worth going to all this trouble.

But if I was going to try and track down the file, I would set a breakpoint with pdb as far down the stack I can get. In this case, that looks like it's inside c:\users\richard\anaconda3\envs\vak-env\lib\site-packages\crowsetta\csv.py. Specifically this line: https://github.com/NickleDave/crowsetta/blob/3db98293b7babd1526dc0f28d7919181ec5b591a/src/crowsetta/csv.py#L207

    with open(csv_filename, 'r', newline='') as csv_file:
        reader = csv.DictReader(csv_file)

        # DictReader automatically uses first row (AKA 'header') as fieldnames
        # when no argument supplied for fieldnames parameter
        # so we use that default to check validity of csv fieldnames
        set_header = set(reader.fieldnames)
        if set_header != set(CSV_FIELDNAMES):

I would edit the file to look something like this:

    with open(csv_filename, 'r', newline='') as csv_file:
        reader = csv.DictReader(csv_file)

        # DictReader automatically uses first row (AKA 'header') as fieldnames
        # when no argument supplied for fieldnames parameter
        # so we use that default to check validity of csv fieldnames
        try:
            set_header = set(reader.fieldnames)
        except:
            import pdb;pdb.set_trace()
        if set_header != set(CSV_FIELDNAMES):

and then when you get an error, just show the filename from the pdb prompt, e.g.,

(Pdb) p `the bane of our existence: ` + csv_filename
'the bane of our existence: ./data/some-annotation.csv'

Of course be careful with editing the files in site packages since you can't set them back to the originals with git checkout or anything. You might also want to double check that you fixed the encoding back inside the cp1252.py file so that's not causing some unexpected errors--you definitely shouldn't have to touch that file! If you're getting really weird errors it might be worth just re-creating the environment from scratch.

Please do let me know what you can about the file types and we can take it from there.

marichard123 commented 3 years ago

Hi David! Thank you for the quick response- I haven't had time so far to poke around in the files some more, but for now I can upload my annotation files along with the corresponding audio _files- it is structured as a single long CSV file with many different short audio files. I will try working with some of your suggestions of how to fix the issue and get back to you ASAP CSV File and Audio Files.zip !

NickleDave commented 3 years ago

Thank you @marichard123 for sharing these files!!! I have it on my to-do list to see if I can replicate the bug with just crowsetta. Will do by the end of this weekend at the latest

NickleDave commented 3 years ago

Hi @marichard123 I am able to open the annotation file with just crowsetta alone.

I am starting to wonder if you are right, that it's literally just because of how the character encoding is set up in the env you're using.

Can you see if you still get the bug if you do the following in your environment?

(vak-env) PS C:\Users\Richard\Documents\Fall_2021\Bat_Stuff\TweetynetPipeline> ipython
In [1]: import crowsetta

In [2]: scribe = crowsetta.Transcriber(format='csv')

In [3]: annots = scribe.from_file('PipelineCSVOutput.csv')

I think it should happen when you execute that third line, if we are right about the encoding.

When I run it (on an Ubuntu-type OS) that line runs without error and I am able to do:

In [7]: annots
Out[7]: 
[Annotation(annot_path=PosixPath('C:\\Users\\Richard\\Documents\\Fall_2021\\Bat_Stuff\\TweetynetPipeline\\Logger16_16_200123_0958_VocExtractData1_mat_annotation.mat'), audio_path=PosixPath('C:\\Users\\Richard\\Documents\\Fall_2021\\Bat_Stuff\\TweetynetPipeline\\Logger16_16_200123_0958_VocExtractData1.wav'), seq=<Sequence with 2 segments>),
 Annotation(annot_path=PosixPath('C:\\Users\\Richard\\Documents\\Fall_2021\\Bat_Stuff\\TweetynetPipeline\\Logger16_17_200123_0958_VocExtractData1_mat_annotation.mat'), audio_path=PosixPath('C:\\Users\\Richard\\Documents\\Fall_2021\\Bat_Stuff\\TweetynetPipeline\\Logger16_17_200123_0958_VocExtractData1.wav'), seq=<Sequence with 2 segments>),
 ...

(which is what vak is trying to get under the hood when it crashes for you)

If you get that crash using just vak then could you please also share your environment? E.g. by creating an environment.yml file with conda and pasting the raw file into a comment, as well as attaching the file itself (in a zip, because github) as a reply?

I can try on a Windows machine and see if I can replicate.

NickleDave commented 3 years ago

Wondering if it's something like this: https://github.com/quantumblacklabs/kedro/issues/291

marichard123 commented 3 years ago

Good morning @NickleDave! Unfortunately I was incapable even of running the first few lines,

In [1]: import crowsetta

In [2]: scribe = crowsetta.Transcriber(format='csv')

In the first case, I kept running into errors of "so and so module not found", even though oftentimes I had the problem installed. I tried massaging the code a bit to direct it towards the correct pathways, but eventually I found that simply copy-pasting all the "missing" modules to the current directory at least seemed to fix the issue for now. When running the

In [2]: scribe = crowsetta.Transcriber(format='csv')

line, I receive the error "ValueError: specified vocal annotation format, csv, not installed, and noconfiguration was specified. Either install format, or specify configuration by passing as the 'config' argument to Transcriber"

Can you give some insight into what exactly installing a vocal annotation format would entail? I went into the Python source file and added a print statement

print(formats._INSTALLED)

in an attempt to see what the program considered valid vocal annotation formats, but I cannot see the print statement on the console. ipython also seems to not react at all to any changes made to the source file. as a secondary issue, do you know how I could overcome this problem? I'm assuming it has to do with the changes made not being loaded in and registered, although my attempts at implementing auto reloading of data have not had an effect.

Additionally, thank you so much for taking the time to help me with these issues! In spite of the various unexpected difficulties encountered while trying to run the program on my end, I am very grateful that you are actively helping me through them :)

NickleDave commented 3 years ago

Hi @marichard123 -- glad to help, @yardencsGitHub and I are happy that people are using the software, and we're excited about what you're working on

I'm a little bit confused about why you wouldn't be able to import crowsetta though -- it's a dependency of vak so it should be installed in your conda env.

Before you spend a bunch of time hacking crowsetta, let's figure out why it's not working as expected.

Below is a checklist with things we can do to troubleshoot. Can you please try each item, and for each item reply with a separate comment?
Please include in the replies the exact commands you enter, and the entire output in the console including full stack traces, verbatim.

[ ] just double-checking: you have the conda environment activated? Please double check this and make sure you do get the error when it's activated.
- it's not clear to me where you copy-pasted modules from crowsetta. You'll want to make sure you're not in that directory, so that you're not accidentally importing the local copies. This would prevent you from replicating the error
[ ] once you verify that you get the error with the environment activated, please reply to me with the exact commands you entered + the full traceback you get when you tried import crowsetta above
[ ] please reply with your conda environment so I can test on Windows whether I can reproduce the bug
- by creating an environment.yml file with conda and pasting the raw file into a comment, as well as attaching the file itself (in a zip, because github) as a reply
- like this (vak-env) PS C:\Users\Richard\Documents\Fall_2021\Bat_Stuff\TweetynetPipeline> conda env export > environment.yml
[ ] please try creating a new environment with a different name -- e.g. vak-env-test-bug -- and verify that even in this new environment you get the exact same error
- again those steps are
  
  C:\You> conda create -n vak-env python==3.8 C:\You> conda activate vak-env (vak-env) C:\You> pip install torch===1.7.1 torchvision===0.8.2 -f https://download.pytorch.org/whl/torch_stable.html (vak-env) C:\You> pip install vak==0.4.0.dev1 (vak-env) C:\You> pip install tweetynet

Techinically we are now on 0.4.0.dev4 but please don't install that one yet, just use 0.4.0dev1 so we can get to the root of the bug. If you just can't get enough troubleshooting, you could try to create a new env with the latest dev version installed (call it, say, "vak040dev4") and then see if you still get the errors. But I really doubt that's the source of the issue--we were running 0.4.0.dev1 just fine.

If this doesn't help us work out what's going on, maybe we can have a quick Zoom meeting. But let's see what you find out.

NickleDave commented 3 years ago

Hi again @marichard123 -- just following up to say it occurred to me that I should be able to use the files you shared to test whether I can replicate the error on Windows

I will do that in the next couple of days

That won't help us figure out quite what's going on with your set-up though. Not trying to rush you but please do go ahead and reply as I asked above whenever you have time.

marichard123 commented 2 years ago

Hello! Here are each of the steps that I have taken- I confirmed that I was already working inside the conda environment, the command "conda init powershell" giving

no change     C:\Users\Richard\Anaconda3\Scripts\conda.exe
no change     C:\Users\Richard\Anaconda3\Scripts\conda-env.exe
no change     C:\Users\Richard\Anaconda3\Scripts\conda-script.py
no change     C:\Users\Richard\Anaconda3\Scripts\conda-env-script.py
no change     C:\Users\Richard\Anaconda3\condabin\conda.bat
no change     C:\Users\Richard\Anaconda3\Library\bin\conda.bat
no change     C:\Users\Richard\Anaconda3\condabin\_conda_activate.bat
no change     C:\Users\Richard\Anaconda3\condabin\rename_tmp.bat
no change     C:\Users\Richard\Anaconda3\condabin\conda_auto_activate.bat
no change     C:\Users\Richard\Anaconda3\condabin\conda_hook.bat
no change     C:\Users\Richard\Anaconda3\Scripts\activate.bat
no change     C:\Users\Richard\Anaconda3\condabin\activate.bat
no change     C:\Users\Richard\Anaconda3\condabin\deactivate.bat
no change     C:\Users\Richard\Anaconda3\Scripts\activate
no change     C:\Users\Richard\Anaconda3\Scripts\deactivate
no change     C:\Users\Richard\Anaconda3\etc\profile.d\conda.sh
no change     C:\Users\Richard\Anaconda3\etc\fish\conf.d\conda.fish
no change     C:\Users\Richard\Anaconda3\shell\condabin\Conda.psm1
no change     C:\Users\Richard\Anaconda3\shell\condabin\conda-hook.ps1
no change     C:\Users\Richard\Anaconda3\Lib\site-packages\xontrib\conda.xsh
no change     C:\Users\Richard\Anaconda3\etc\profile.d\conda.csh
no change     C:\Users\Richard\Documents\WindowsPowerShell\profile.ps1
No action taken.

marichard123 commented 2 years ago

The exact commands I entered + the error traceback:

(base) PS C:\Users\Richard> conda activate vak-env
(vak-env) PS C:\Users\Richard> cd Documents\Fall_2021\Bat_Stuff\TweetynetPipeline
(vak-env) PS C:\Users\Richard\Documents\Fall_2021\Bat_Stuff\TweetynetPipeline> ipython
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.22.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import crowsetta
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-af422012dd85> in <module>
----> 1 import crowsetta

ModuleNotFoundError: No module named 'crowsetta'

In [2]:

marichard123 commented 2 years ago

The full conda environment, with the zip file attached at the bottom: name: vak-env channels:

pytorch
defaults dependencies:
blas=1.0=mkl
bokeh=2.3.2=py36haa95532_0
brotlipy=0.7.0=py36h2bbff1b_1003
ca-certificates=2021.10.26=haa95532_2
cachecontrol=0.12.6=pyhd3eb1b0_0
cachy=0.3.0=pyhd3eb1b0_0
certifi=2021.5.30=py36haa95532_0
cleo=0.7.6=py_0
click=8.0.3=pyhd3eb1b0_0
clikit=0.4.3=py_0
cloudpickle=2.0.0=pyhd3eb1b0_0
contextvars=2.4=py_0
cryptography=35.0.0=py36h71e12ea_0
cudatoolkit=11.3.1=h59b6b97_2
cycler=0.10.0=py36haa95532_0
cytoolz=0.11.0=py36he774522_0
dask=2021.3.0=pyhd3eb1b0_0
dask-core=2021.3.0=pyhd3eb1b0_0
dataclasses=0.8=pyh4f3eec9_6
distributed=2021.3.0=py36haa95532_0
entrypoints=0.3=py36_0
freetype=2.11.0=ha860e81_0
fsspec=2021.8.1=pyhd3eb1b0_0
heapdict=1.0.1=pyhd3eb1b0_0
html5lib=1.1=pyhd3eb1b0_0
icc_rt=2019.0.0=h0cc432a_1
icu=58.2=ha925a31_3
immutables=0.16=py36h2bbff1b_0
importlib_metadata=1.1.3=0
intel-openmp=2021.3.0=haa95532_3372
jinja2=3.0.1=pyhd3eb1b0_0
joblib=1.0.1=pyhd3eb1b0_0
jpeg=9d=h2bbff1b_0
jsonschema=3.2.0=pyhd3eb1b0_2
keyring=18.0.1=py36_0
kiwisolver=1.3.1=py36hd77b12b_0
libpng=1.6.37=h2a8f88b_0
libtiff=4.2.0=hd0e1b90_0
libuv=1.40.0=he774522_0
locket=0.2.1=py36haa95532_1
lockfile=0.12.2=py36haa95532_0
lz4-c=1.9.3=h2bbff1b_1
markupsafe=2.0.1=py36h2bbff1b_0
matplotlib=3.3.4=py36haa95532_0
matplotlib-base=3.3.4=py36h49ac443_0
mkl=2019.4=245
mkl-service=2.3.0=py36h196d8e1_0
mkl_fft=1.3.0=py36h46781fe_0
mkl_random=1.0.4=py36h343c172_0
msgpack-python=1.0.2=py36h59b6b97_1
olefile=0.46=py36_0
openssl=1.1.1l=h2bbff1b_0
packaging=21.0=pyhd3eb1b0_0
pandas=1.1.5=py36hd77b12b_0
partd=1.2.0=pyhd3eb1b0_0
pastel=0.2.1=py_0
pexpect=4.8.0=pyhd3eb1b0_3
pillow=8.2.0=py36h4fa10fc_0
pip=21.0.1=py36haa95532_0
pkginfo=1.7.1=py36haa95532_0
poetry=1.0.10=py36_0
psutil=5.8.0=py36h2bbff1b_1
ptyprocess=0.7.0=pyhd3eb1b0_2
pycparser=2.20=py_2
pylev=1.3.0=py_0
pyopenssl=21.0.0=pyhd3eb1b0_1
pyparsing=2.4.7=pyhd3eb1b0_0
pyqt=5.9.2=py36h6538335_2
pyrsistent=0.14.11=py36h2bbff1b_0
pysocks=1.7.1=py36haa95532_0
python=3.6.13=h3758d61_0
python-dateutil=2.8.2=pyhd3eb1b0_0
pytorch-mutex=1.0=cuda
pytz=2021.3=pyhd3eb1b0_0
pywin32-ctypes=0.2.0=py36_1000
pyyaml=5.4.1=py36h2bbff1b_1
qt=5.9.7=vc14h73c81de_0
requests=2.26.0=pyhd3eb1b0_0
requests-toolbelt=0.8.0=py_1
setuptools=58.0.4=py36haa95532_0
shellingham=1.3.1=pyhd3eb1b0_0
sip=4.19.8=py36h6538335_0
six=1.16.0=pyhd3eb1b0_0
sortedcontainers=2.4.0=pyhd3eb1b0_0
sqlite=3.36.0=h2bbff1b_0
tblib=1.7.0=pyhd3eb1b0_0
tk=8.6.11=h2bbff1b_0
toml=0.10.2=pyhd3eb1b0_0
tomlkit=0.5.11=py36_1
toolz=0.11.1=pyhd3eb1b0_0
tornado=6.1=py36h2bbff1b_0
tqdm=4.62.3=pyhd3eb1b0_1
typing-extensions=3.10.0.2=hd3eb1b0_0
typing_extensions=3.10.0.2=pyh06a4308_0
vc=14.2=h21ff451_1
vs2015_runtime=14.27.29016=h5e58377_2
webencodings=0.5.1=py36_1
wheel=0.37.0=pyhd3eb1b0_1
win_inet_pton=1.1.0=py36haa95532_0
wincertstore=0.2=py36h7fe50ca_0
xz=5.2.5=h62dcd97_0
yaml=0.2.5=he774522_0
zict=2.0.0=pyhd3eb1b0_0
zipp=3.6.0=pyhd3eb1b0_0
zlib=1.2.11=h62dcd97_4
zstd=1.4.9=h19a0ad4_0
pip:
- absl-py==0.15.0
- attrs==20.3.0
- cachetools==4.2.4
- cffi==1.15.0
- charset-normalizer==2.0.7
- colorama==0.4.4
- crowsetta==3.1.1.post1
- evfuncs==0.3.2.post1
- google-auth==2.3.2
- google-auth-oauthlib==0.4.6
- grpcio==1.41.1
- idna==3.3
- importlib-metadata==4.8.1
- koumura==0.2.1.post1
- markdown==3.3.4
- numpy==1.19.5
- oauthlib==3.1.1
- protobuf==3.19.1
- pyasn1==0.4.8
- pyasn1-modules==0.2.8
- requests-oauthlib==1.3.0
- rsa==4.7.2
- scipy==1.5.4
- soundfile==0.10.3.post1
- tensorboard==2.7.0
- tensorboard-data-server==0.6.1
- tensorboard-plugin-wit==1.8.0
- torch==1.7.1
- torchvision==0.8.2
- tweetynet==0.6.0
- urllib3==1.26.7
- vak==0.4.0b5
- werkzeug==2.0.2 prefix: C:\Users\Richard\Anaconda3\envs\vak-env environment.zip

marichard123 commented 2 years ago

Trying a new virtual environment with a different name (vak-env-test-bug), I get the exact same error:

In [1]: import crowsetta
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-af422012dd85> in <module>
----> 1 import crowsetta

ModuleNotFoundError: No module named 'crowsetta'

In [2]:

NickleDave commented 2 years ago

thank you for doing all that @marichard123, it helps me see what's going on

I think the first issue is that I assumed ipython would be installed in your environment, but it's not. Sorry! (It's a dev dependency for vak so I did have it installed 😬)

Somewhat confusingly, conda will happily start ipython from the base environment without telling you.
If you do

(vak-env) PS C:\Users\Richard> which ipython

then I think you will get some path that is not inside C:\Users\Richard\Anaconda3\envs\vak-env, probably it's the one in base instead.
That's why it can't "see" crowsetta.

I can also tell because it's a different Python (3.8) from the one you have installed in your env (3.6).

To fix, please do: (vak-env) PS C:\Users\Richard> conda install ipython

Then try this again:

(vak-env) PS C:\Users\Richard\Documents\Fall_2021\Bat_Stuff\TweetynetPipeline> ipython
In [1]: import crowsetta

In [2]: scribe = crowsetta.Transcriber(format='csv')

In [3]: annots = scribe.from_file('PipelineCSVOutput.csv')

and please let me know what you get in that case.

marichard123 commented 2 years ago

We have found the issue! The encoding issue mentioned was a result of vak attempting to open a .notmat file as a .csv file. Originally in our initial CSV annotation file, our "annotation file" column- the sixth column- had contained the pathway names of .mat files. We changed the setup of our CSV annotation file so that this column was changed to contain the name of the CSV annotation file itself. As an example, every column in the sixth row contains the pathway string: "C:\Users\Richard\Documents\Fall_2021\Bat_Stuff\TweetynetPipeline\PipelineCSVOutput.csv" This simple fix completely fixes the aforementioned encoding problem.

NickleDave commented 2 years ago

Going to close this original issue as fixed -- others referenced above addressed the root issue