WandB errors experienced by users

Open louisfh opened 5 months ago

louisfh commented 5 months ago

Discussed in

A number of users have had issues with wandb. I have not been able to recreate the bug on a linux machine or my mac. The users who had the problem may all be using windows machines (2 out of 3 definitely were). The offending line from one user's traceback is: FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\lscala\AppData\Local\Temp\tmplk94vkyk\Samples / training samples.table.json'

I think this might be because when we log to wandb, we include a forward slash in the name of the log, which apparently wandb includes in the name of the json file. I suspect on window's machines, this forward slash in a filepath is not handled well (it might be interpreted as an escape character). We would have to change lines like this:

We could get rid of the forward slash and see if that fixes things. However @sammlapp said this is used by wandb for subgrouping things in the web gui.

louisfh commented 5 months ago

/ is apparently a reserved character, and can't be used in filenames on windows machines.

louisfh commented 5 months ago

@sammlapp Maybe we can use nested dictionaries instead to nest stuff in the wandb gui?

louisfh commented 4 months ago

I used a windows machine, installed opensoundscape according to the current windows install docs, and was not able to recreate the issue. So apparently the / thing above is not an issue.

Here's pip list of the working environment on Windows Subsystem for Linux (package list omitted for brevity)

paulpeyret-biophonia commented 1 month ago

Hi @louisfh ! I have the same issue here and maybe some ideas to fix it:

Configuration and error description

My configuration is the following : OS: windows 11 IDE: VSCode OPSO version : 0.10.1 wandb version : 0.13.11

The error is happening while running model.train() function 'C:\Users\MyUsername\AppData\Local\Temp\tmph8mf5v6o\Samples / training samples.table.json'

The error happen in file at line 842: when calling wandb_session.log()

Steps to reproduce

I managed to reproduce the bug in a notebook by calling the following lines after creating a model and a dataset :

from opensoundscape import AudioFileDataset
from opensoundscape.logging import wandb_table
afd=AudioFileDataset(train_df, model.preprocessor, bypass_augmentations=False)
wandb_session.log({"Samples / training samples":table})

Causes and fix

It seems like the space before the "/" in the dictionary keys provided are causing the error on windows OS. I deleted all spaces before "/" and it solved the issue.

"Samples / training samples" becomes "Samples/training_samples" (line 844) "Samples / training samples no aug" becomes "Samples/ training samples no aug" (line 850) and "Samples / validation samples" becomes "Samples/ validation samples" (line 856)

I can see this have been patched in branch patch_wandb_windows. Just sharing what i found, hopefully this could help making it cross platform. 😊


sammlapp commented 1 month ago

Thanks for looking into this and sharing your findings! This is helpful, as we weren't sure if the "/" character was involved in the bug or not

sammlapp commented 1 month ago

@paulpeyret-biophonia since I don't have a windows machine to test on, I'd be curious if the branch patch_wandb_windows works for you, and successfully logs tables of samples to wandb

sammlapp commented 1 month ago

the patch_wandb_windows branch isn't working. Weirdly, the syntax

{"Samples":{"training_samples": table}}

is breaking, instead of creating a section Samples and a table inside it, it creates a blank table in the Tables section.

We can just stop using nested tables and instead log tables to the default Tables section.