kitzeslab / opensoundscape

Open source, scalable software for the analysis of bioacoustic recordings
http://opensoundscape.org
MIT License
128 stars 14 forks source link

WandB errors experienced by users #952

Open louisfh opened 5 months ago

louisfh commented 5 months ago

Discussed in https://github.com/kitzeslab/opensoundscape/discussions/903

A number of users have had issues with wandb. I have not been able to recreate the bug on a linux machine or my mac. The users who had the problem may all be using windows machines (2 out of 3 definitely were). The offending line from one user's traceback is: FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\lscala\AppData\Local\Temp\tmplk94vkyk\Samples / training samples.table.json'

I think this might be because when we log to wandb, we include a forward slash in the name of the log, which apparently wandb includes in the name of the json file. I suspect on window's machines, this forward slash in a filepath is not handled well (it might be interpreted as an escape character). We would have to change lines like this:

https://github.com/kitzeslab/opensoundscape/blob/cf77a561f8b7f372b930c928b07349af958490a8/opensoundscape/ml/cnn.py#L222

We could get rid of the forward slash and see if that fixes things. However @sammlapp said this is used by wandb for subgrouping things in the web gui.

louisfh commented 5 months ago

/ is apparently a reserved character, and can't be used in filenames on windows machines. https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file?

louisfh commented 5 months ago

@sammlapp Maybe we can use nested dictionaries instead to nest stuff in the wandb gui? https://docs.wandb.ai/ref/python/log

louisfh commented 4 months ago

I used a windows machine, installed opensoundscape according to the current windows install docs, and was not able to recreate the issue. So apparently the / thing above is not an issue.

Here's pip list of the working environment on Windows Subsystem for Linux

Package                   Version
------------------------- ---------------
anyio                     4.2.0
appdirs                   1.4.4
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
aru-metadata-parser       0.1.0
asttokens                 2.4.1
async-lru                 2.0.4
attrs                     23.2.0
audioread                 3.0.1
Babel                     2.14.0
beautifulsoup4            4.12.3
bleach                    6.1.0
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
comm                      0.2.1
contextlib2               21.6.0
contourpy                 1.2.0
cycler                    0.12.1
debugpy                   1.8.1
decorator                 5.1.1
defusedxml                0.7.1
Deprecated                1.2.14
docker-pycreds            0.4.0
docopt                    0.6.2
exceptiongroup            1.2.0
executing                 2.0.1
fastjsonschema            2.19.1
filelock                  3.13.1
fonttools                 4.48.1
fqdn                      1.5.1
fsspec                    2024.2.0
gitdb                     4.0.11
GitPython                 3.1.41
grad-cam                  1.5.0
h11                       0.14.0
httpcore                  1.0.3
httpx                     0.26.0
idna                      3.6
imageio                   2.34.0
ipykernel                 6.29.2
ipython                   8.21.0
ipywidgets                8.1.2
isoduration               20.11.0
jedi                      0.19.1
Jinja2                    3.1.3
joblib                    1.3.2
json5                     0.9.14
jsonpointer               2.4
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
jupyter_client            8.6.0
jupyter_core              5.7.1
jupyter-events            0.9.0
jupyter-lsp               2.2.2
jupyter_server            2.12.5
jupyter_server_terminals  0.5.2
jupyterlab                4.1.1
jupyterlab_pygments       0.3.0
jupyterlab_server         2.25.2
jupyterlab_widgets        3.0.10
kiwisolver                1.4.5
lazy_loader               0.3
librosa                   0.10.1
llvmlite                  0.42.0
MarkupSafe                2.1.5
matplotlib                3.8.2
matplotlib-inline         0.1.6
mistune                   3.0.2
mpmath                    1.3.0
msgpack                   1.0.7
nbclient                  0.9.0
nbconvert                 7.16.0
nbformat                  5.9.2
nest-asyncio              1.6.0
networkx                  3.2.1
notebook_shim             0.2.3
numba                     0.59.0
numpy                     1.26.4
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.19.3
nvidia-nvjitlink-cu12     12.3.101
nvidia-nvtx-cu12          12.1.105
opencv-python             4.9.0.80
opensoundscape            0.10.1
overrides                 7.7.0
packaging                 23.2
pandas                    2.2.0
pandocfilters             1.5.1
parso                     0.8.3
pathtools                 0.1.2
pexpect                   4.9.0
pillow                    10.2.0
pip                       23.3.1
platformdirs              4.2.0
pooch                     1.8.0
prometheus_client         0.20.0
prompt-toolkit            3.0.43
protobuf                  4.25.2
psutil                    5.9.8
ptyprocess                0.7.0
pure-eval                 0.2.2
pycparser                 2.21
Pygments                  2.17.2
pyparsing                 3.1.1
python-dateutil           2.8.2
python-json-logger        2.0.7
pytz                      2024.1
PyWavelets                1.5.0
PyYAML                    6.0.1
pyzmq                     25.1.2
referencing               0.33.0
requests                  2.31.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rpds-py                   0.18.0
schema                    0.7.5
scikit-image              0.22.0
scikit-learn              1.4.0
scipy                     1.12.0
Send2Trash                1.8.2
sentry-sdk                1.40.4
setproctitle              1.3.3
setuptools                68.2.2
six                       1.16.0
smmap                     5.0.1
sniffio                   1.3.0
soundfile                 0.12.1
soupsieve                 2.5
soxr                      0.3.7
stack-data                0.6.3
sympy                     1.12
terminado                 0.18.0
threadpoolctl             3.3.0
tifffile                  2024.2.12
tinycss2                  1.2.1
tomli                     2.0.1
torch                     2.2.0
torchvision               0.17.0
tornado                   6.4
tqdm                      4.66.2
traitlets                 5.14.1
triton                    2.2.0
ttach                     0.0.3
types-python-dateutil     2.8.19.20240106
typing_extensions         4.9.0
tzdata                    2024.1
uri-template              1.3.0
urllib3                   2.2.0
wandb                     0.13.11
wcwidth                   0.2.13
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.7.0
wheel                     0.41.2
widgetsnbextension        4.0.10
wrapt                     1.16.0
paulpeyret-biophonia commented 1 month ago

Hi @louisfh ! I have the same issue here and maybe some ideas to fix it:

Configuration and error description

My configuration is the following : OS: windows 11 IDE: VSCode OPSO version : 0.10.1 wandb version : 0.13.11

The error is happening while running model.train() function 'C:\Users\MyUsername\AppData\Local\Temp\tmph8mf5v6o\Samples / training samples.table.json'

The error happen in cnn.py file at line 842: when calling wandb_session.log()

Steps to reproduce

I managed to reproduce the bug in a notebook by calling the following lines after creating a model and a dataset :

from opensoundscape import AudioFileDataset
from opensoundscape.logging import wandb_table
afd=AudioFileDataset(train_df, model.preprocessor, bypass_augmentations=False)
table=wandb_table(afd,n=8)
wandb_session.log({"Samples / training samples":table})

Causes and fix

It seems like the space before the "/" in the dictionary keys provided are causing the error on windows OS. I deleted all spaces before "/" and it solved the issue.

"Samples / training samples" becomes "Samples/training_samples" (line 844) "Samples / training samples no aug" becomes "Samples/ training samples no aug" (line 850) and "Samples / validation samples" becomes "Samples/ validation samples" (line 856)

I can see this have been patched in branch patch_wandb_windows. Just sharing what i found, hopefully this could help making it cross platform. 😊

Cheers

sammlapp commented 1 month ago

Hi @louisfh ! I have the same issue here and maybe some ideas to fix it:

Configuration and error description

My configuration is the following : OS: windows 11 IDE: VSCode OPSO version : 0.10.1 wandb version : 0.13.11

The error is happening while running model.train() function 'C:\Users\MyUsername\AppData\Local\Temp\tmph8mf5v6o\Samples / training samples.table.json'

The error happen in cnn.py file at line 842: when calling wandb_session.log()

Steps to reproduce

I managed to reproduce the bug in a notebook by calling the following lines after creating a model and a dataset :

from opensoundscape import AudioFileDataset
from opensoundscape.logging import wandb_table
afd=AudioFileDataset(train_df, model.preprocessor, bypass_augmentations=False)
table=wandb_table(afd,n=8)
wandb_session.log({"Samples / training samples":table})

Causes and fix

It seems like the space before the "/" in the dictionary keys provided are causing the error on windows OS. I deleted all spaces before "/" and it solved the issue.

"Samples / training samples" becomes "Samples/training_samples" (line 844) "Samples / training samples no aug" becomes "Samples/ training samples no aug" (line 850) and "Samples / validation samples" becomes "Samples/ validation samples" (line 856)

I can see this have been patched in branch patch_wandb_windows. Just sharing what i found, hopefully this could help making it cross platform. 😊

Cheers

Thanks for looking into this and sharing your findings! This is helpful, as we weren't sure if the "/" character was involved in the bug or not

sammlapp commented 1 month ago

@paulpeyret-biophonia since I don't have a windows machine to test on, I'd be curious if the branch patch_wandb_windows works for you, and successfully logs tables of samples to wandb

sammlapp commented 1 month ago

the patch_wandb_windows branch isn't working. Weirdly, the syntax

{"Samples":{"training_samples": table}}

is breaking, instead of creating a section Samples and a table inside it, it creates a blank table in the Tables section.

We can just stop using nested tables and instead log tables to the default Tables section.