ocean-data-factory-sweden / kso

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
GNU General Public License v3.0
4 stars 12 forks source link

Notebook 5 cannot read in the files from the ml-template-project correctly on a Windows computer #276

Open jannesgg opened 1 year ago

jannesgg commented 1 year ago

When you run Notebook 5 in Google Colab on a windows computer, the file names of the images contain Sa╠êcken instead of säcken. This is because windows decodes the file names during the unzipping with CP437 instead of utf-8, which Linux does automatically. You can see that difference with the code below.

b'a\xcc\x88'.decode('CP437')

b'a\xcc\x88'.decode('utf-8')

This causes that there is no data at all available for the training of the model. WandB prints a warning for this during the training and evaluation, but does run.