neptune-ai / open-solution-mapping-challenge

Open solution to the Mapping Challenge :earth_americas:
https://www.crowdai.org/challenges/mapping-challenge
MIT License
380 stars 96 forks source link

No such file or directory: '/data/train/annotation.json' using Neptune cloud #182

Closed carbonox-infernox closed 6 years ago

carbonox-infernox commented 6 years ago

When running the command neptune send --pip-requirements-file requirements.txt --worker gcp-large --environment pytorch-0.3.1-gpu-py3 main.py -- prepare_masks I get the error FileNotFoundError: [Errno 2] No such file or directory: '/data/train/annotation.json' but only when running Neptune in the cloud. When running purely locally, I don't have this issue. Within Neptune my file structure looks like MAPMC/uploads/data, and my Neptune.yaml file contains the following: parameters:

Data Paths

data_dir: /data meta_dir: /data masks_overlayed_prefix: masks_overlayed experiment_dir: /data/work I have tried the argument --input data, but this did not help. From what I've found, Neptune uses uploads by default, so I don't see why this issue is happening. What can I do to fix this?

HubertJaworski commented 6 years ago

Hi,

Files uploaded to /uploads directory are not mounted in your experiment by default.

To make it accessible from experiment, you need to use one or more --input command (or put them in your neptune.yaml). (docs)

carbonox-infernox commented 6 years ago

I've tried every combination of: --input MAPMC/uploads/data --input uploads/data --input data --input uploads etc as well as every combination inside neptune.yaml: data_dir: /data data_dir: uploads/data data_dir: MAPMC/uploads/data data_dir: uploads and nothing works. Which one of these do you think would be correct? It's possible that I stumbled around and missed the right combination

HubertJaworski commented 6 years ago

Try: data_dir: /input/data

Keep in mind, that all resources mouned with --input end up in /input directory on worker

carbonox-infernox commented 6 years ago

I will try that, but I'm not sure I'm following the phrase "all resources mounted[sic] with --input end up in /input directory on worker"

Are you saying I should use an --input argument?

Also, what directory would you recommend for experiment_dir when running in the cloud?

HubertJaworski commented 6 years ago

If you want to use a directory from your uploads, the only way to do it (except public datasets provided by us) is to use - -input, and they end up in /input directory when running in cloud

carbonox-infernox commented 6 years ago

Oh, ok. So what form should my input argument take? Before, I have tried the combination of data_dir: data in .yaml, and --input data in the neptune send command, but that did not work. Should i now do data_dir: input/data in .yaml, and --input input/data? or --input data or just --input Sorry to keep bothering you but there are a ton of combinations that could be used.

HubertJaworski commented 6 years ago

I'll need to confirm (i'm writing this from a mobile), but if your uploads contain a 'data' dir, you should have --input data and set data_dir to /input/data

carbonox-infernox commented 6 years ago

Ok thanks, trying now.

carbonox-infernox commented 6 years ago

I think I'm past that issue now. Thanks! Now I'm having read-only errors when it tried to make a directory for masks_overlayed_eroded_0_dilated_2. I'm wondering if this has anything to do with me setting: experiment_dir: work Is there a better place for me to put that, or a better way to specify, like output/work for example?

HubertJaworski commented 6 years ago

I'm not a data scientist and I'm not familiar with this experiment internals, but try /output/work - /output has a special purpose and Neptune automatially stores whatever you put there in persistent storageafter experiment ends

carbonox-infernox commented 6 years ago

I ran it again with: experiment_dir: /output/work and I'm getting: OSError: [Errno 30] Read-only file system: '/input/data/masks_overlayed_eroded_0_dilated_2'

It seems to me that it's trying to create masks_overlayed_eroded_0_dilated_2 within the data folder for some reason

jakubczakon commented 6 years ago

Hi @carbonox-infernox I will get back to you first thing tomorrow morning with a full description but here is a gist:

carbonox-infernox commented 6 years ago

Oh, I'm an idiot. I just realized that I was setting experiment_dir to output rather than meta_dir. (also apparently output is plural)

Thanks for the info! I will try that and see what happens. Good to know I can set paths to other experiments. That preemptively solved the next issue I would have run into.

carbonox-infernox commented 6 years ago

My experiment has been running for 30 minutes, so I assume it's working now (prepare masks step). I used meta_dir: /outputs but I can't find the masks folder anywhere. It's not in MAPMC-29 (current experiment number) or its sub-directories. It's also not in the inputs, or a separate folder in MAPMC.

Angel0003 commented 4 years ago

Running the command' python main.py prepare_masks' when training on my own dataset, it throws out an erro: No such file or directory: 'data / raw / train / annotation.json'. How to generate that file?

Angel0003 commented 4 years ago

When running the command neptune send --pip-requirements-file requirements.txt --worker gcp-large --environment pytorch-0.3.1-gpu-py3 main.py -- prepare_masks I get the error FileNotFoundError: [Errno 2] No such file or directory: '/data/train/annotation.json' but only when running Neptune in the cloud. When running purely locally, I don't have this issue. Within Neptune my file structure looks like MAPMC/uploads/data, and my Neptune.yaml file contains the following: parameters:

Data Paths

data_dir: /data meta_dir: /data masks_overlayed_prefix: masks_overlayed experiment_dir: /data/work I have tried the argument --input data, but this did not help. From what I've found, Neptune uses uploads by default, so I don't see why this issue is happening. What can I do to fix this?

Running the command' python main.py prepare_masks' when training on my own dataset, it throws out an erro: No such file or directory: 'data / raw / train / annotation.json'. How to generate that file?

Angel0003 commented 4 years ago

Hi,

Files uploaded to /uploads directory are not mounted in your experiment by default.

To make it accessible from experiment, you need to use one or more --input command (or put them in your neptune.yaml). (docs)

Running the command' python main.py prepare_masks' when training on my own dataset, it throws out an erro: No such file or directory: 'data / raw / train / annotation.json'. How to generate that file?

jakubczakon commented 4 years ago

Hi @Angel0003, thank you for taking a look at our code!

To rerun the experiments or run on your own data you should follow the: https://github.com/neptune-ml/open-solution-mapping-challenge/blob/master/REPRODUCE_RESULTS.md

Specifically, you should follow the following project structure:

project
|--   README.md
|-- ...
|-- data
    |-- raw
         |-- train 
            |-- images 
            |-- annotation.json
         |-- val 
            |-- images 
            |-- annotation.json
         |-- test_images 
            |-- img1.jpg
            |-- img2.jpg
            |-- ...
    |-- meta
         |-- masks_overlayed_eroded_{}_dilated_{} # it is generated automatically
            |-- train 
                |-- distances 
                |-- masks 
                |-- sizes 
            |-- val 
                |-- distances 
                |-- masks 
                |-- sizes 
    |-- experiments
        |-- mapping_challenge_baseline # this is where your experiment files will be dumped
            |-- checkpoints # neural network checkpoints
            |-- transformers # serialized transformers after fitting
            |-- outputs # outputs of transformers if you specified save_output=True anywhere
            |-- prediction.json # prediction on valid

That means both for train and validation sets you need to have your folder with images and annotations.json files associated with them. annotations.json are in the standard coco format as explained on the competition site https://www.aicrowd.com/challenges/mapping-challenge#datasets.

I hope this helps. If you have any additional questions do let me know.