neptune-ai / open-solution-mapping-challenge

Open solution to the Mapping Challenge :earth_americas:
https://www.crowdai.org/challenges/mapping-challenge
MIT License
381 stars 96 forks source link

Error when running Evaluate : axis 1 is out of bounds for array of dimension 0 #230

Open Julmatap opened 4 years ago

Julmatap commented 4 years ago

Hello everyone,

Firstly I would like to say thank you for this amazing project and the release of your code. I am very new to programming and ML and I wanted to try your project.

I downloaded the data, installed the environments on latest versions, followed the reproduced results and looked at the issues and tried to solve the errors I got by myself as much as possible, but right now i'm stucked.

I first prepared masks & metadata then I downloaded your best models weights, created the transformers folder in data_experiments/mapping_challenge_baseline and copied "unet" and "scoring_model" there. I then changed the values in neptume.yaml as suggested and tried to evaluate by running : "python main.py evaluate --pipeline_name unet"

Here is what I got : image image

Hope I gave you enough informations, and that you can help me resolve this issue.

Again, thank you guys !

PS : I saw that issue #228 has this problem also, but i don't have the "valid data is None" as you can see on screenshots here attached

jakubczakon commented 4 years ago

Hi @Julmatap, thank for the nice words.

Could you please paste your directory structure (data paths) and the content of the neptune.yaml file (data paths) ?

Julmatap commented 4 years ago

Sure ! (Thanks for your quick reply !)

Based on yours, here's mine :

Data_OSM_Buildings:
|--   README.md
|-- ...
|-- data_raw
    |-- train 
         |-- images 
         |-- annotation.json
    |-- val 
         |-- images 
         |-- annotation.json
    |-- test (and not test_images) 
         |-- img1.jpg
         |-- img2.jpg
         |-- ...
|-- data_meta
    |-- masks_overlayed_eroded_{}_dilated_{} 
         |-- train 
             |-- distances 
             |-- masks 
             |-- sizes 
         |-- val 
             |-- distances 
             |-- masks 
             |-- sizes 
    |-- metadata.csv
|-- data_experiments
    |-- mapping_challenge_baseline 
         |-- transformers
             |--unet
             |--scoring_model
         |-- outputs 
         |-- tmp

And the parameters content of my neptune.yaml looks like this : project: shared/showroom

name: mapping_challenge_baseline tags: [solution_5]

parameters:

Data Paths

data_dir: data_raw meta_dir: data_meta masks_overlayed_prefix: masks_overlayed experiment_dir: data_experiments/mapping_challenge_baseline

Hope I understood what you asked me.

Julmatap commented 4 years ago

I'll try to use the exact same directory structure as you did in your example and correct the neptune.yaml accordingly to see if it resolves the problem. I didn't find where in the code it would pose a problem to have this kind of structure but since i'm a beginner it may be normal lmao.

jakubczakon commented 4 years ago

Mhm, so do you have data_raw data_meta and others inside of the data directory or a the same level as README.md ?

Julmatap commented 4 years ago

Dear Jakub, I had data_raw and data_meta folders at the same level as README.md . After you answered I made the directory structure exactly like what you suggested which now looks like this :

Data_OSM
|--   README.md
|-- ...
|-- data
    |-- raw
         |-- train 
            |-- images 
            |-- annotation.json
         |-- val 
            |-- images 
            |-- annotation.json
         |-- test_images 
            |-- img1.jpg
            |-- img2.jpg
            |-- ...
    |-- meta
         |-- masks_overlayed_eroded_0_dilated_0 
            |-- train 
                |-- distances 
                |-- masks 
                |-- sizes 
            |-- val 
                |-- distances 
                |-- masks 
                |-- sizes 
    |-- experiments
        |-- mapping_challenge_baseline
            |-- checkpoints 
            |-- transformers 
                |--unet
                |--scoring_model
            |-- outputs 

I changed my neptume.yaml to look like this :

Data Paths

data_dir: data/raw meta_dir: data/meta masks_overlayed_prefix: masks_overlayed experiment_dir: data/experiments/mapping_challenge_baseline

However, when i run evaluate, it still throws me the exact same error.

jakubczakon commented 4 years ago

Could you try and run the training pipeline first?

python main.py train --pipeline_name unet
Julmatap commented 4 years ago

I did it as soon as you told me to, but it doesn't seem like it's running, here's a screenshot : image

Julmatap commented 4 years ago

Could it be linked to my environment ? I had to change versions and some stuff for it to work (i'm on windows10 x64). I couldn't install Torch v 0.3.1 and others via your environment.yml.

Here is my complete env for information :

$ conda list
# packages in environment at C:\Users\matierej\Anaconda3\envs\mapping:
#
# Name                    Version                   Build  Channel
attrdict                  2.0.1                    pypi_0    pypi
attrs                     19.3.0                     py_0
backcall                  0.2.0                      py_0
bleach                    3.1.5                      py_0
bravado                   10.6.2                   pypi_0    pypi
bravado-core              5.17.0                   pypi_0    pypi
ca-certificates           2020.6.24                     0
certifi                   2020.6.20                py38_0
chardet                   3.0.4                    pypi_0    pypi
click                     7.1.2                    pypi_0    pypi
colorama                  0.4.3                      py_0
cycler                    0.10.0                   pypi_0    pypi
cython                    0.29.21                  pypi_0    pypi
decorator                 4.4.2                      py_0
defusedxml                0.6.0                      py_0
entrypoints               0.3                      py38_0
future                    0.18.2                   pypi_0    pypi
gitdb                     4.0.5                    pypi_0    pypi
gitpython                 3.1.7                    pypi_0    pypi
idna                      2.10                     pypi_0    pypi
imageio                   2.9.0                    pypi_0    pypi
imgaug                    0.4.0                    pypi_0    pypi
importlib-metadata        1.7.0                    py38_0
importlib_metadata        1.7.0                         0
ipykernel                 5.3.4            py38h5ca1d4c_0
ipython                   7.17.0                   pypi_0    pypi
ipython-genutils          0.2.0                    pypi_0    pypi
ipython_genutils          0.2.0                    py38_0
jedi                      0.17.2                   pypi_0    pypi
jinja2                    2.11.2                     py_0
joblib                    0.16.0                   pypi_0    pypi
jsonpointer               2.0                      pypi_0    pypi
jsonref                   0.2                      pypi_0    pypi
jsonschema                3.2.0                    py38_0
jupyter_client            6.1.6                      py_0
jupyter_core              4.6.3                    py38_0
kiwisolver                1.2.0                    pypi_0    pypi
libsodium                 1.0.18               h62dcd97_0
lightgbm                  2.3.1                    pypi_0    pypi
m2w64-gcc-libgfortran     5.3.0                         6
m2w64-gcc-libs            5.3.0                         7
m2w64-gcc-libs-core       5.3.0                         7
m2w64-gmp                 6.1.0                         2
m2w64-libwinpthread-git   5.0.0.4634.697f757               2
markupsafe                1.1.1            py38he774522_0
matplotlib                3.3.0                    pypi_0    pypi
mistune                   0.8.4           py38he774522_1000
monotonic                 1.5                      pypi_0    pypi
msgpack                   1.0.0                    pypi_0    pypi
msgpack-python            0.5.6                    pypi_0    pypi
msys2-conda-epoch         20160418                      1
munch                     2.5.0                    pypi_0    pypi
nbconvert                 5.6.1                    py38_0
nbformat                  5.0.7                      py_0
neptune-client            0.4.119                  pypi_0    pypi
neptune-contrib           0+unknown                pypi_0    pypi
networkx                  2.4                      pypi_0    pypi
notebook                  6.0.3                    py38_0
numpy                     1.19.1                   pypi_0    pypi
oauthlib                  3.1.0                    pypi_0    pypi
opencv-python             4.3.0.36                 pypi_0    pypi
openssl                   1.1.1g               he774522_0
packaging                 20.4                       py_0
pandas                    1.1.0                    pypi_0    pypi
pandoc                    2.10                          0
pandocfilters             1.4.2                    py38_1
parso                     0.7.1                    pypi_0    pypi
pickleshare               0.7.5                 py38_1000
pillow                    7.2.0                    pypi_0    pypi
pip                       20.1.1                   py38_1
pretrainedmodels          0.7.4                    pypi_0    pypi
prometheus_client         0.8.0                      py_0
prompt-toolkit            3.0.5                      py_0
psutil                    5.7.2                    pypi_0    pypi
py3nvml                   0.2.6                    pypi_0    pypi
pycocotools-windows       2.0.0.2                  pypi_0    pypi
pydensecrf                1.0rc2                   pypi_0    pypi
pydot-ng                  2.0.0                    pypi_0    pypi
pygments                  2.6.1                      py_0
pyjwt                     1.7.1                    pypi_0    pypi
pyparsing                 2.4.7                      py_0
pyrsistent                0.16.0           py38he774522_0
python                    3.8.5                he1778fa_0
python-dateutil           2.8.1                      py_0
pytz                      2020.1                   pypi_0    pypi
pywavelets                1.1.1                    pypi_0    pypi
pywin32                   227              py38he774522_1
pywinpty                  0.5.7                    py38_0
pyyaml                    5.3.1                    pypi_0    pypi
pyzmq                     19.0.1           py38ha925a31_1
requests                  2.24.0                   pypi_0    pypi
requests-oauthlib         1.3.0                    pypi_0    pypi
rfc3987                   1.3.8                    pypi_0    pypi
scikit-image              0.17.2                   pypi_0    pypi
scikit-learn              0.23.2                   pypi_0    pypi
scipy                     1.5.2                    pypi_0    pypi
send2trash                1.5.0                    py38_0
setuptools                49.2.0                   py38_0
shapely                   1.7.0                    pypi_0    pypi
simplejson                3.17.2                   pypi_0    pypi
six                       1.15.0                     py_0
smmap                     3.0.4                    pypi_0    pypi
sqlite                    3.32.3               h2a8f88b_0
strict-rfc3339            0.7                      pypi_0    pypi
swagger-spec-validator    2.7.3                    pypi_0    pypi
terminado                 0.8.3                    py38_0
testpath                  0.4.4                      py_0
threadpoolctl             2.1.0                    pypi_0    pypi
tifffile                  2020.7.24                pypi_0    pypi
torch                     1.4.0+cpu                pypi_0    pypi
torchvision               0.5.0+cpu                pypi_0    pypi
tornado                   6.0.4            py38he774522_1
tqdm                      4.48.2                   pypi_0    pypi
traitlets                 4.3.3                    py38_0
typing-extensions         3.7.4.2                  pypi_0    pypi
urllib3                   1.25.10                  pypi_0    pypi
vc                        14.1                 h0510ff6_4
vs2015_runtime            14.16.27012          hf0eaf9b_3
wcwidth                   0.2.5                      py_0
webcolors                 1.11.1                   pypi_0    pypi
webencodings              0.5.1                    py38_1
websocket-client          0.57.0                   pypi_0    pypi
wheel                     0.34.2                   py38_0
wincertstore              0.2                      py38_0
winpty                    0.4.3                         4
xgboost                   1.1.1                    pypi_0    pypi
xmltodict                 0.12.0                   pypi_0    pypi
zeromq                    4.3.2                ha925a31_2
zipp                      3.1.0                      py_0
zlib                      1.2.11               h62dcd97_4
jakubczakon commented 4 years ago

Ok, I see.

I think that is the problem. Everything was written for torch==0.3.1 and newer releases (torch==1.4 for sure) have changed things. Could you try and install it by hand:

pip install torch==0.3.1
Julmatap commented 4 years ago

I tried again installing torch==0.3.1 and here's what I got image

jakubczakon commented 4 years ago

I think you need to downgrade python for this. As environment.yml suggest it should be python=3.6.8 The easiest way to do it is to go:

conda env create -f environment.yml

but as I understand that isn't working. In that case I'd just create a clean conda environment with python 3.6:

conda create -n py_36_env python=3.6

activate it

conda activate py_36_env

and then install the dependencies from environment.yml

Julmatap commented 4 years ago

Dear @jakubczakon thank you for your time helping me !

It was indeed a problem with Torch version.

However I still couldn't install Torch==0.3.1 even after downgrading python and so on... I tried manually installing it, but I still had the same error as per my previous screenshot, I also tried finding a .whl but there was none for v 0.3.1 win64. So I tried using the closest version I could find and ended up using Torch version 0.4.1, and it seems to be working just fine.

Right now I'm still evaluating since more than 5 hours, and I can tell it's running with neptune and my memory usage. (But it's not using GPU despite my GPU being cuda compatible, is it normal ?)

image

image

Once it is finished and I am sure it is working, I will post my conda list, if it can help someone else facing this issue.

Again, thank you very much for your time 👍

jakubczakon commented 4 years ago

That is awesome thank you @Julmatap!

You can see if it is running but going to the Charts or Logs section of the UI. You should see some activity there.

jakubczakon commented 4 years ago

Also, it seems that there is a message in the terminal that explains why the GPU is not logged -> sending GPU metrics was blocked by the system. This is very much unexpected.

Julmatap commented 4 years ago

Unfortunately there is nothing "No charts here" and "waiting for data", and I think there's a problem, someone else launched an experiments 3mn and he has already some stuffs.

jakubczakon commented 4 years ago

You are running the evaluation now correct? If so can you try running training and see if something is happening?

Also you can go to your terminal and run:

nvidia-smi

to see if your GPU is actually doing something.

Julmatap commented 4 years ago

Hi Jakub,

Yes I was running the evaluation, I tried running the training today, and there's still no charts. I runned nvidia-smi before and today again and it says that no processes is running.

I guess I'll just start again on a Linux VM. Has anyone already suceeded in running it on windows ? The operating system blocking the request is something I don't find any informations on.

jakubczakon commented 4 years ago

I see @Julmatap,

Unfortunately, I don't know if anyone succeeded on Windows -> everyone I know who used this repo was using Linux.

Julmatap commented 4 years ago

I confirm it is working on Linux, I just did an evaluation and it worked, just had to wait a long time at "steps >>> step unet transforming...".

Again, thanks for your time @jakubczakon ! 👍

jakubczakon commented 4 years ago

That is awesome!

I only wish I could be of more help but I am proud of you getting it done.

Julmatap commented 4 years ago

Dear Jakub,

One last question :

I have my metrics which looks like this : image

However, when I run the notebook "results on exploration" my predictions are blank : image

Edit : I did open predictions.json and manually added the first image_id I saw on the notebook and the prediction runs fine. However it seems that all the val folder has not been predicted and so when I run random choices it usually doesn't work.