visinf / acis

Actor-Critic Instance Segmentation (CVPR 2019)
Apache License 2.0
19 stars 4 forks source link

Error Pre-training CVPPP #4

Open Khoa-NT opened 4 years ago

Khoa-NT commented 4 years ago

Hi @arnike ,

You forgot to change the CFG in /runs/cvppp_preproc_save.sh CFG=001_preproc

After moved the aug_data, I run the ./runs/cvppp_pretrain.sh and I have some error: log_git.txt The forst problem is the dataloader was given wrong data path. The rest I don't know.

I still can't find a way to fix it. Can you check it?

Khoa-NT commented 4 years ago

Hi @arnike Do you have any suggestions? Thank you

arnike commented 4 years ago

Hi @shaolinkhoa, ./runs/cvppp_pretrain.sh runs a sequence of commands. The first fails and the rest of commands cannot find the snapshot that should have been created by the first. So, please, check that you specify the data path correctly. The error says that you're loading /home/khoa/acis_master/code/data/cvppp/A1_AUG/train/plant102_rgb.png, and it's not there. Best, Nikita

Khoa-NT commented 4 years ago

Hi @arnike Thank you for your reply and Merry Christmas :christmas_tree: Can you check the code in cvppp.lua ? When I run the code, I got:

iPath: plant065_rgb.png
mPath: plant065_label.png

There are no file name plant065_rgb.png and plant065_label.png in my self.dir : /acis_master/code/data/cvppp/A1_AUG/train My AUG images files have the format name like this: 038398_rgb.png

I think there is a problem in the self.imageInfo.imagePath and also self.imageInfo.maskPath Because plant065_rgb.png is in the A1_RAW dataset.

If there is no problem in the self.imageInfo.imagePath and self.imageInfo.maskPath, then do we need raw image plant065_rgb.png is also in /acis_master/code/data/cvppp/A1_AUG/train ?

Can you check the name of the images in your AUG dataset? Or the difference between the format name?

Update: I copied all the raw image into the self.dir : /acis_master/code/data/cvppp/A1_AUG/train but I still get the error: log_git_2.txt

Khoa-NT commented 4 years ago

hi @arnike Happy new year. Would you mind checking again?

Khoa-NT commented 4 years ago

@arnike I'm sorry for bothering you, but can you check it again? And can I have your email so that we can discuss it?

anhtuanhsgs commented 4 years ago

I got the same issue, does anyone have the solution for this?

arnike commented 4 years ago

Hi guys, apologies for a delay, a bit overwhelmed here... Have you tried removing the cache files in gen/*.t7? When you run the code, the dataloader checks for the cache files to load the image list and loads the files from that list; otherwise it will scan the directory and create new cache files. Since you've generated new files, cache needs to be updated. Unfortunately, this currently works only if you manually delete the cache data. Best, Nikita

Khoa-NT commented 4 years ago

Hi @arnike , I deleted all files in gen but it sill has the problem when loading dataset log_git.txt

arnike commented 4 years ago

Hi @shaolinkhoa, seems like when you generate gen/cvppp_*.t7, the provided path is off, and the script doesn't find any images.

  1. When you generate these file lists (the first run after you delete them) you should see in the log output " | found [NUMBER] image-label pairs" and where NUMBER is likely 0 in your case. To see which directory gets searched, you can print dir variable here.
  2. If the cvppp-gen.lua finds your new images, but you still get an error loading it, please, make sure you full path paths.concat(self.dir, iPath) is specified correctly here.

Same for #5. Best, Nikita

Khoa-NT commented 4 years ago

Hi @arnike , Thank you for replying.

It seems the error is the find -L /khoa/acis/code/data/cvppp/A1_AUG/train/* -maxdepth 5 -iname "*_rgb.png" command:
-bash: /usr/bin/find: Argument list too long There are a lot of files in A1_AUG/train/ and A1_AUG/val/ so find can't return anything.

How did you run find in A1_AUG/train/ and A1_AUG/val/ ?

arnike commented 4 years ago

Does this command list your files: find -L /khoa/acis/code/data/cvppp/A1_AUG/train/ -maxdepth 5 -iname "*_rgb.png"? If so, could you, please, try removing * in the line here?

Khoa-NT commented 4 years ago

Hi @arnike I'm sorry but I got another error about The server at localhost:6039 does not appear to be up log_6039.txt

Is it related to Crayon? Do I have to install it?

Khoa-NT commented 4 years ago

Hi @arnike , Can you check it for me?

arnike commented 4 years ago

Hi @shaolinkhoa yes, you need Crayon server running. Please, see this README to set it up. Best, Nikita

Khoa-NT commented 4 years ago

Hi @arnike I can't access the given link. Is it ok to follow the guide from here?

What I did: 1) I pulled the docker image from here.

2) I started docker by this command: docker run -d -p 8888:8888 -p 8889:8889 --name crayon alband/crayon

3) I go to <acis>/code and then run the nohup tensorboard --logdir tensorboard --port 6038 > tensorboard/tb.log 2>&1 &

I tried both tensorboard==1.13 and tensorboard==2.1 but still get the error from tensorboard/tb.log: AttributeError: module 'tensorflow.python.estimator.api.estimator' has no attribute 'SessionRunHook'

What is the version of your tensorboard? What should I do with the docker container from step 2 ?

arnike commented 4 years ago

What is the version of your tensorboard?

Tensorboard 1.10 worked for me.

What should I do with the docker container from step 2 ?

Docker should stay running.

Khoa-NT commented 4 years ago

Hi @arnike ,

To run the nohup python crayon/server/server.py --port 6039 --logdir tensorboard --tb-port 6038 > tensorboard/crayon.log 2>&1 & we have to replace this with from urllib.request import urlopen and replace urllib2.urlopen with urlopen

After that, I run the ./runs/cvppp_pretrain.sh again but I get this error: /home/khoa/torch/install/bin/lua: /home/khoa/torch/install/share/lua/5.2/crayon.lua:91: Something went wrong. Server sent: Experiment name should be a non-empty string or unicode instead of '<class 'str'>'. stack traceback: [C]: in function 'error' /home/khoa/torch/install/share/lua/5.2/crayon.lua:91: in function 'remove_experiment' main.lua:240: in function 'createExperiment' main.lua:253: in main chunk [C]: in function 'dofile' ...khoa/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: in ? log_str.txt

In the log file, opt.seq_length = seq_length = 1 but I can't find the value of opt.config_id or config_id

Can you check it for me? thank you

Khoa-NT commented 4 years ago

hi @arnike Do you have any solutions?

arnike commented 4 years ago

Hi @shaolinkhoa, it seems that removing the existing record in crayon doesn't work properly. It first tried to create a new record here, it couldn't, so it tried to remove it here, which it couldn't either. Does it work for your setup when you execute it manually (see "Managing experiments" here). If so, before the run, please, make sure there are no experiments with the same name in crayon (e.g. cc:get_experiment_names()).

Khoa-NT commented 4 years ago

hi @arnike I would like to ask how to run the Managing experiments? Where I run those commands? Because I still don't understand about crayon

Khoa-NT commented 4 years ago

hi @arnike , Would you mind giving me more clues?

arnike commented 4 years ago

Hi @shaolinkhoa,

Where I run those commands?

you run the commands in the torch shell: just execute th in the directory where Crayon saves the log files.