neptune-ai / open-solution-mapping-challenge

Open solution to the Mapping Challenge :earth_americas:
https://www.crowdai.org/challenges/mapping-challenge
MIT License
377 stars 96 forks source link

Confused about generating target masks #229

Open willhunger opened 3 years ago

willhunger commented 3 years ago

Hi, I got some troubles when I run python main.py prepare-masks.

$ python main.py prepare-masks
Output directory: data/meta/masks_overlayed_eroded_0_dilated_0
loading annotations into memory...
Done (t=20.34s)
creating index...
index created!
total 41220
creating index...

This command has been running for 1 day and 9 hours and generated 35G data(about 103203 pictures) in ./meta.

$ ls -l | grep '^-' | wc -l
103203

$ du -h --max-depth=1
0       ./experiments
35G     ./meta
7.1G    ./raw
42G     .

And it‘s still running.

$ ps -eo pid,lstart,etime,cmd | grep python
31564 Wed Jul 29 10:52:49 2020  1-09:48:50 python main.py prepare-masks
31694 Wed Jul 29 10:52:51 2020  1-09:48:48 /home/jbd/.conda/envs/mapping/bin/python -c from multiprocessing.semaphore_tracker import main;main(4)

$top -c
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
31564 jbd       20   0  0.482t 0.028t 126812 S 322.8 11.4   6668:05 python main.py prepare-masks

Did I get some bug on multi-thread or something else? Is there any solution to handle this? Should I use the annotation-small.json instead of annotation.json ?

Thanks.

jakubczakon commented 3 years ago

Hi @willhunger ,

I remember that it was taking time as it is creating masks for building size and edges as well but 1 day 9 hours seems long. It's possible that everything is fine and you'll get your masks folder ready soon.

You could try running it on annotations-small.json to see how long it takes for it to finish and then get a time estimate. This should scale linearly as far as I understand.

willhunger commented 3 years ago

Hey @jakubczakon, I used watch -n 90 -d 'ls -l | egrep '^-' | wc -l ' and noticed that the number of the labeled image is increasing. By the way, I find the total raw train image is sizeable.

/data/raw/train/images$ ls -l | grep '^-' | wc -l
280741

/data/meta/masks_overlayed_eroded_0_dilated_0/train/masks$ ls -l | egrep '^-' | wc -l
105805

Only 1/3 labeled image generated. But I got 72 cpus logically.

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                72
On-line CPU(s) list:   0-71
Thread(s) per core:    2
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               1246.042
CPU max MHz:           3600.0000
CPU min MHz:           1200.0000
BogoMIPS:              4601.56

Some error happened when handling multi-thread tasks. Do you get any idea to fix this?
I'll try annotations-small.json later if I couldn't fix this.

Thanks a lot.

jakubczakon commented 3 years ago

Sorry, no idea what is the problem with multi-threading here.

Perhaps you could play with the numbers in the config neptune.yaml file:

# Execution
  overwrite: 0
  num_workers: 4
  num_threads: 1000

to fix this.

willhunger commented 3 years ago

Thanks for your response, @jakubczakon . But now I got another error, I use the annotations-small.json to generate target masks, this time I got the result within serval minutes. And When I preparing metadata with python main.py prepare-metadata --train_data --valid_data, I got this

2020-08-01 16-10-19 mapping-challenge >>> creating metadata
  0%|                                                                                                                             | 0/280741 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 68, in <module>
    main()
  File "/home/jbd/.conda/envs/mapping/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/jbd/.conda/envs/mapping/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/jbd/.conda/envs/mapping/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/jbd/.conda/envs/mapping/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jbd/.conda/envs/mapping/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "main.py", line 22, in prepare_metadata
    pipeline_manager.prepare_metadata(train_data, valid_data)
  File "/home/ubuntu/cugxyy/jbd/open-solution-mapping-challenge/src/pipeline_manager.py", line 35, in prepare_metadata
    prepare_metadata(train_data, valid_data, self.logger, self.params)
  File "/home/ubuntu/cugxyy/jbd/open-solution-mapping-challenge/src/pipeline_manager.py", line 95, in prepare_metadata
    process_validation_data=valid_data)
  File "/home/ubuntu/cugxyy/jbd/open-solution-mapping-challenge/src/utils.py", line 192, in generate_metadata
    train_metadata = _generate_metadata(dataset="train")
  File "/home/ubuntu/cugxyy/jbd/open-solution-mapping-challenge/src/utils.py", line 167, in _generate_metadata
    image_id = file_name2img_id[image_file_name]
KeyError: '000000000000.jpg'

0/280741 is so weird, 280741 is the total item of the /data/raw, but annotations-small.json just labeled some of them. I checked issues 119 but I didn't get any working solutions.

jakubczakon commented 3 years ago

@willhunger Oh I see, I think the problem is that you would need to have /data/raw that contains only the images from the annotations-small.json but it contains all of the images.

What you could do is create a /data-small/raw or something where you only have the id's from the annotations-small.json.