Open Julmatap opened 4 years ago
Hi @Julmatap, thank for the nice words.
Could you please paste your directory structure (data paths) and the content of the neptune.yaml
file (data paths) ?
Sure ! (Thanks for your quick reply !)
Based on yours, here's mine :
Data_OSM_Buildings:
|-- README.md
|-- ...
|-- data_raw
|-- train
|-- images
|-- annotation.json
|-- val
|-- images
|-- annotation.json
|-- test (and not test_images)
|-- img1.jpg
|-- img2.jpg
|-- ...
|-- data_meta
|-- masks_overlayed_eroded_{}_dilated_{}
|-- train
|-- distances
|-- masks
|-- sizes
|-- val
|-- distances
|-- masks
|-- sizes
|-- metadata.csv
|-- data_experiments
|-- mapping_challenge_baseline
|-- transformers
|--unet
|--scoring_model
|-- outputs
|-- tmp
And the parameters content of my neptune.yaml looks like this : project: shared/showroom
name: mapping_challenge_baseline tags: [solution_5]
parameters:
data_dir: data_raw meta_dir: data_meta masks_overlayed_prefix: masks_overlayed experiment_dir: data_experiments/mapping_challenge_baseline
Hope I understood what you asked me.
I'll try to use the exact same directory structure as you did in your example and correct the neptune.yaml accordingly to see if it resolves the problem. I didn't find where in the code it would pose a problem to have this kind of structure but since i'm a beginner it may be normal lmao.
Mhm, so do you have data_raw
data_meta
and others inside of the data
directory or a the same level as README.md
?
Dear Jakub, I had data_raw and data_meta folders at the same level as README.md . After you answered I made the directory structure exactly like what you suggested which now looks like this :
Data_OSM
|-- README.md
|-- ...
|-- data
|-- raw
|-- train
|-- images
|-- annotation.json
|-- val
|-- images
|-- annotation.json
|-- test_images
|-- img1.jpg
|-- img2.jpg
|-- ...
|-- meta
|-- masks_overlayed_eroded_0_dilated_0
|-- train
|-- distances
|-- masks
|-- sizes
|-- val
|-- distances
|-- masks
|-- sizes
|-- experiments
|-- mapping_challenge_baseline
|-- checkpoints
|-- transformers
|--unet
|--scoring_model
|-- outputs
I changed my neptume.yaml to look like this :
data_dir: data/raw meta_dir: data/meta masks_overlayed_prefix: masks_overlayed experiment_dir: data/experiments/mapping_challenge_baseline
However, when i run evaluate, it still throws me the exact same error.
Could you try and run the training pipeline first?
python main.py train --pipeline_name unet
I did it as soon as you told me to, but it doesn't seem like it's running, here's a screenshot :
Could it be linked to my environment ? I had to change versions and some stuff for it to work (i'm on windows10 x64). I couldn't install Torch v 0.3.1 and others via your environment.yml.
Here is my complete env for information :
$ conda list
# packages in environment at C:\Users\matierej\Anaconda3\envs\mapping:
#
# Name Version Build Channel
attrdict 2.0.1 pypi_0 pypi
attrs 19.3.0 py_0
backcall 0.2.0 py_0
bleach 3.1.5 py_0
bravado 10.6.2 pypi_0 pypi
bravado-core 5.17.0 pypi_0 pypi
ca-certificates 2020.6.24 0
certifi 2020.6.20 py38_0
chardet 3.0.4 pypi_0 pypi
click 7.1.2 pypi_0 pypi
colorama 0.4.3 py_0
cycler 0.10.0 pypi_0 pypi
cython 0.29.21 pypi_0 pypi
decorator 4.4.2 py_0
defusedxml 0.6.0 py_0
entrypoints 0.3 py38_0
future 0.18.2 pypi_0 pypi
gitdb 4.0.5 pypi_0 pypi
gitpython 3.1.7 pypi_0 pypi
idna 2.10 pypi_0 pypi
imageio 2.9.0 pypi_0 pypi
imgaug 0.4.0 pypi_0 pypi
importlib-metadata 1.7.0 py38_0
importlib_metadata 1.7.0 0
ipykernel 5.3.4 py38h5ca1d4c_0
ipython 7.17.0 pypi_0 pypi
ipython-genutils 0.2.0 pypi_0 pypi
ipython_genutils 0.2.0 py38_0
jedi 0.17.2 pypi_0 pypi
jinja2 2.11.2 py_0
joblib 0.16.0 pypi_0 pypi
jsonpointer 2.0 pypi_0 pypi
jsonref 0.2 pypi_0 pypi
jsonschema 3.2.0 py38_0
jupyter_client 6.1.6 py_0
jupyter_core 4.6.3 py38_0
kiwisolver 1.2.0 pypi_0 pypi
libsodium 1.0.18 h62dcd97_0
lightgbm 2.3.1 pypi_0 pypi
m2w64-gcc-libgfortran 5.3.0 6
m2w64-gcc-libs 5.3.0 7
m2w64-gcc-libs-core 5.3.0 7
m2w64-gmp 6.1.0 2
m2w64-libwinpthread-git 5.0.0.4634.697f757 2
markupsafe 1.1.1 py38he774522_0
matplotlib 3.3.0 pypi_0 pypi
mistune 0.8.4 py38he774522_1000
monotonic 1.5 pypi_0 pypi
msgpack 1.0.0 pypi_0 pypi
msgpack-python 0.5.6 pypi_0 pypi
msys2-conda-epoch 20160418 1
munch 2.5.0 pypi_0 pypi
nbconvert 5.6.1 py38_0
nbformat 5.0.7 py_0
neptune-client 0.4.119 pypi_0 pypi
neptune-contrib 0+unknown pypi_0 pypi
networkx 2.4 pypi_0 pypi
notebook 6.0.3 py38_0
numpy 1.19.1 pypi_0 pypi
oauthlib 3.1.0 pypi_0 pypi
opencv-python 4.3.0.36 pypi_0 pypi
openssl 1.1.1g he774522_0
packaging 20.4 py_0
pandas 1.1.0 pypi_0 pypi
pandoc 2.10 0
pandocfilters 1.4.2 py38_1
parso 0.7.1 pypi_0 pypi
pickleshare 0.7.5 py38_1000
pillow 7.2.0 pypi_0 pypi
pip 20.1.1 py38_1
pretrainedmodels 0.7.4 pypi_0 pypi
prometheus_client 0.8.0 py_0
prompt-toolkit 3.0.5 py_0
psutil 5.7.2 pypi_0 pypi
py3nvml 0.2.6 pypi_0 pypi
pycocotools-windows 2.0.0.2 pypi_0 pypi
pydensecrf 1.0rc2 pypi_0 pypi
pydot-ng 2.0.0 pypi_0 pypi
pygments 2.6.1 py_0
pyjwt 1.7.1 pypi_0 pypi
pyparsing 2.4.7 py_0
pyrsistent 0.16.0 py38he774522_0
python 3.8.5 he1778fa_0
python-dateutil 2.8.1 py_0
pytz 2020.1 pypi_0 pypi
pywavelets 1.1.1 pypi_0 pypi
pywin32 227 py38he774522_1
pywinpty 0.5.7 py38_0
pyyaml 5.3.1 pypi_0 pypi
pyzmq 19.0.1 py38ha925a31_1
requests 2.24.0 pypi_0 pypi
requests-oauthlib 1.3.0 pypi_0 pypi
rfc3987 1.3.8 pypi_0 pypi
scikit-image 0.17.2 pypi_0 pypi
scikit-learn 0.23.2 pypi_0 pypi
scipy 1.5.2 pypi_0 pypi
send2trash 1.5.0 py38_0
setuptools 49.2.0 py38_0
shapely 1.7.0 pypi_0 pypi
simplejson 3.17.2 pypi_0 pypi
six 1.15.0 py_0
smmap 3.0.4 pypi_0 pypi
sqlite 3.32.3 h2a8f88b_0
strict-rfc3339 0.7 pypi_0 pypi
swagger-spec-validator 2.7.3 pypi_0 pypi
terminado 0.8.3 py38_0
testpath 0.4.4 py_0
threadpoolctl 2.1.0 pypi_0 pypi
tifffile 2020.7.24 pypi_0 pypi
torch 1.4.0+cpu pypi_0 pypi
torchvision 0.5.0+cpu pypi_0 pypi
tornado 6.0.4 py38he774522_1
tqdm 4.48.2 pypi_0 pypi
traitlets 4.3.3 py38_0
typing-extensions 3.7.4.2 pypi_0 pypi
urllib3 1.25.10 pypi_0 pypi
vc 14.1 h0510ff6_4
vs2015_runtime 14.16.27012 hf0eaf9b_3
wcwidth 0.2.5 py_0
webcolors 1.11.1 pypi_0 pypi
webencodings 0.5.1 py38_1
websocket-client 0.57.0 pypi_0 pypi
wheel 0.34.2 py38_0
wincertstore 0.2 py38_0
winpty 0.4.3 4
xgboost 1.1.1 pypi_0 pypi
xmltodict 0.12.0 pypi_0 pypi
zeromq 4.3.2 ha925a31_2
zipp 3.1.0 py_0
zlib 1.2.11 h62dcd97_4
Ok, I see.
I think that is the problem.
Everything was written for torch==0.3.1
and newer releases (torch==1.4
for sure) have changed things.
Could you try and install it by hand:
pip install torch==0.3.1
I tried again installing torch==0.3.1 and here's what I got
I think you need to downgrade python for this.
As environment.yml
suggest it should be python=3.6.8
The easiest way to do it is to go:
conda env create -f environment.yml
but as I understand that isn't working. In that case I'd just create a clean conda environment with python 3.6:
conda create -n py_36_env python=3.6
activate it
conda activate py_36_env
and then install the dependencies from environment.yml
Dear @jakubczakon thank you for your time helping me !
It was indeed a problem with Torch version.
However I still couldn't install Torch==0.3.1 even after downgrading python and so on... I tried manually installing it, but I still had the same error as per my previous screenshot, I also tried finding a .whl but there was none for v 0.3.1 win64. So I tried using the closest version I could find and ended up using Torch version 0.4.1, and it seems to be working just fine.
Right now I'm still evaluating since more than 5 hours, and I can tell it's running with neptune and my memory usage. (But it's not using GPU despite my GPU being cuda compatible, is it normal ?)
Once it is finished and I am sure it is working, I will post my conda list, if it can help someone else facing this issue.
Again, thank you very much for your time 👍
That is awesome thank you @Julmatap!
You can see if it is running but going to the Charts
or Logs
section of the UI.
You should see some activity there.
Also, it seems that there is a message in the terminal that explains why the GPU is not logged -> sending GPU metrics was blocked by the system. This is very much unexpected.
Unfortunately there is nothing "No charts here" and "waiting for data", and I think there's a problem, someone else launched an experiments 3mn and he has already some stuffs.
You are running the evaluation now correct? If so can you try running training and see if something is happening?
Also you can go to your terminal and run:
nvidia-smi
to see if your GPU is actually doing something.
Hi Jakub,
Yes I was running the evaluation, I tried running the training today, and there's still no charts. I runned nvidia-smi before and today again and it says that no processes is running.
I guess I'll just start again on a Linux VM. Has anyone already suceeded in running it on windows ? The operating system blocking the request is something I don't find any informations on.
I see @Julmatap,
Unfortunately, I don't know if anyone succeeded on Windows -> everyone I know who used this repo was using Linux.
I confirm it is working on Linux, I just did an evaluation and it worked, just had to wait a long time at "steps >>> step unet transforming...".
Again, thanks for your time @jakubczakon ! 👍
That is awesome!
I only wish I could be of more help but I am proud of you getting it done.
Dear Jakub,
One last question :
I have my metrics which looks like this :
However, when I run the notebook "results on exploration" my predictions are blank :
Edit : I did open predictions.json and manually added the first image_id I saw on the notebook and the prediction runs fine. However it seems that all the val folder has not been predicted and so when I run random choices it usually doesn't work.
Hello everyone,
Firstly I would like to say thank you for this amazing project and the release of your code. I am very new to programming and ML and I wanted to try your project.
I downloaded the data, installed the environments on latest versions, followed the reproduced results and looked at the issues and tried to solve the errors I got by myself as much as possible, but right now i'm stucked.
I first prepared masks & metadata then I downloaded your best models weights, created the transformers folder in data_experiments/mapping_challenge_baseline and copied "unet" and "scoring_model" there. I then changed the values in neptume.yaml as suggested and tried to evaluate by running : "python main.py evaluate --pipeline_name unet"
Here is what I got :
Hope I gave you enough informations, and that you can help me resolve this issue.
Again, thank you guys !
PS : I saw that issue #228 has this problem also, but i don't have the "valid data is None" as you can see on screenshots here attached