neptune-ai / open-solution-mapping-challenge

Open solution to the Mapping Challenge :earth_americas:
https://www.crowdai.org/challenges/mapping-challenge
MIT License
380 stars 96 forks source link

KeyError: "['file_path_mask_eroded_3'] not in index" #119

Open XYAskWhy opened 6 years ago

XYAskWhy commented 6 years ago

When running local pure python with python main.py -- train_evaluate_predict --pipeline_name unet --chunk_size 5000 , the following error occurs, any help?

neptune: Executing in Offline Mode. neptune: Executing in Offline Mode. 2018-05-29 16-16-52 mapping-challenge >>> training neptune: Executing in Offline Mode. 2018-05-29 16-16-55 steps >>> step xy_train adapting inputs 2018-05-29 16-16-55 steps >>> step xy_train fitting and transforming... Traceback (most recent call last): File "main.py", line 282, in action() File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 722, in call return self.main(args, kwargs) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(args, kwargs) File "main.py", line 79, in train _train(pipeline_name, dev_mode) File "main.py", line 106, in _train pipeline.fit_transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 103, in fit_transform step_inputs[input_step.name] = input_step.fit_transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 103, in fit_transform step_inputs[input_step.name] = input_step.fit_transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 103, in fit_transform step_inputs[input_step.name] = input_step.fit_transform(data) [Previous line repeated 5 more times] File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 109, in fit_transform step_output_data = self._cached_fit_transform(step_inputs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 120, in _cached_fit_transform step_output_data = self.transformer.fit_transform(step_inputs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 253, in fit_transform return self.transform(*args, **kwargs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/preprocessing/misc.py", line 17, in transform y = meta[self.y_columns].values File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/pandas/core/frame.py", line 2133, in getitem return self._getitem_array(key) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/pandas/core/frame.py", line 2177, in _getitem_array indexer = self.loc._convert_to_indexer(key, axis=1) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1269, in _convert_to_indexer .format(mask=objarr[mask])) KeyError: "['file_path_mask_eroded_3'] not in index"

jakubczakon commented 6 years ago

@XYAskWhy Hi, did you generate metadata? You can do that by running python main.py -- prepare_metadata and you also need to prepare masks by going python main.py -- prepare_masks

If you have already done that then open your metadata csv and check which columns are available. Remember that you can choose how to generate your target masks so your csv may contain different columns. You can choose which column should be used as target masks in pipeline_config.py:

Y_COLUMNS = ['file_path_mask_eroded_0_dilated_0']
dslwz2008 commented 6 years ago

I also encountered this problem. l will have a try. Thanks!

XYAskWhy commented 6 years ago

Thanks @jakubczakon , I had done prepare_metadata and prepare_masks then, and the problem is we must prepare masks first.

dslwz2008 commented 6 years ago

@XYAskWhy After I executed python main.py -- prepare_masks, this error still exists. What columns are in your stage1_metadata.csv ? There are only the following columns in my file: ImageId, file_path_image, is_train, is_valid, is_test, n_buildings. Is there anything wrong? What else do I need to do?

XYAskWhy commented 6 years ago

@dslwz2008 If you prepare metadata first, you need to redo it after you prepare mask. Then the newly generated csv file will include a extra column like 'file_path_mask_eroded_0_dilated_0'.

jakubczakon commented 6 years ago

@XYAskWhy @dslwz2008 I will fix the readme today but yes as @XYAskWhy when metadata is created it looks for the folders with target masks and creates the columns based on that information. It may seem over the top at first glance but creating target masks for this problem is very far from trivial. The following ideas are all viable options:

I hope this helps!

dslwz2008 commented 6 years ago

I re-executed commands python main.py -- prepare_masks and neptune experiment run main.py -- prepare_metadata \ --train_data \ --valid_data \ --test_data in order. However, there is still no file_path_mask_eroded_0_dilated_0 column in my file _stage1metadata.csv. I am using the master branch. What else do I need to do? @XYAskWhy @jakubczakon

jakubczakon commented 6 years ago

@dslwz2008 what are your paths in the neptune.yaml ?


  data_dir:                   /path/to/data
  meta_dir:                   /path/to/data
  masks_overlayed_dir:        /path/to/masks_overlayed
  masks_overlayed_eroded_dir: /path/to/masks_overlayed_eroded
  experiment_dir:             /path/to/work/dir

Can you confirm that your masks did generate? The mask overlayed folder should be around 100G

dslwz2008 commented 6 years ago

This is my neptune.yaml:

data_dir: /home/shenshen/Programs/mc_data meta_dir: /home/shenshen/Programs/mc_data masks_overlayed_dir: /home/shenshen/Programs/mc_dat_eroded_2_dilated_3
masks_overlayed_eroded_dir: /home/shenshen/Programs/mc_dat_eroded_2_dilated_3
experiment_dir: /home/shenshen/Programs/open-solution-mapping-challenge

I am not sure if the masks_overlayed_dir setting is correct. @jakubczakon

jakubczakon commented 6 years ago

Ok I see. You just need to have something like:

masks_overlayed_dir: /home/shenshen/Programs/masks_overlayed/

and it will create this particular setting with eroded2_dilated_3 automatically. Below is the piece of the code that deals with this part:


            images_path_to_write = images_path
            masks_overlayed_dir_ = masks_overlayed_dir[:-1]
            masks_dir_prefix = os.path.split(masks_overlayed_dir_)[1]
            masks_overlayed_sufix_to_write = []
            for masks_dir in os.listdir(meta_dir):
                masks_dir_name = os.path.split(masks_dir)[1]
                if masks_dir_name.startswith(masks_dir_prefix):
                    masks_overlayed_sufix_to_write.append(masks_dir_name[len(masks_dir_prefix):])

So you need to define your path ending with / . I will submit an issue to clean that up right away but I am not sure if I will have time to change that today as I want to do some last minute postprocessing of the newest models and start generate final submission.

jakubczakon commented 6 years ago

@dslwz2008 @XYAskWhy by the way I updated the readme. The most important part is that best training results were achieved when training with distance and size weighted loss so the pipeline that needs to be chosed is unet_weighted instead of the unet. Also when running predictions using replication padding+test time augmentation gave us significant improvements. The pipeline to run it is called unet_padded_tta

jakubczakon commented 6 years ago

@dslwz2008 also I would change the

experiment_dir: /home/shenshen/Programs/open-solution-mapping-challenge

to something particular to this experiment. All the models will be saved in that directory so I am not sure if you want to have it as generic as open-solution-mapping-challenge. I usually have something like this:

experiment_dir: ...mapping-challenge/experiments/resnet34_crop256_erode2_dilate_3

or something like that.

dslwz2008 commented 6 years ago

OK. Thanks! After python main.py -- prepare_masks, folder mc_dat_eroded_2_dilated_3 was generated. According to my statistics, it takes up 123.1GB of space. When was this folder generated? masks_overlayed_dir: /home/shenshen/Programs/masks_overlayed/

jakubczakon commented 6 years ago

Well you misspecified the

masks_overlayed_dir: 

So I dont think you have that folder.

Now there are 2 options. You could either specify it correctly:

masks_overlayed_dir: /home/shenshen/Programs/masks_overlayed/

and rerun generation of the masks (takes time)

or you could simply go

mv /home/shenshen/Programs/mc_dat_eroded_2_dilated_3  /home/shenshen/Programs/masks_overlayed

and rerun metadata creation

dslwz2008 commented 6 years ago

Thank you very much! I understand. I did not create _masksoverlayed folder before prepare_masks. So how is this item set up? When is the content in it generated? masks_overlayed_eroded_dir: ???

jakubczakon commented 6 years ago

this one should actually be dropped It is a remnant of older days when we only thought of 2 configurations of those target masks :) I will drop it from readme and yamls

jakubczakon commented 6 years ago

well you generated all those masks with prepare_masks you just put it in a wrong directory

dslwz2008 commented 6 years ago

I finally configured correctly! Unfortunately, I did not find the column file_path_mask_eroded_0_dilated_0 in the generated stage1_metadata.csv file... The first time I created metadata, it took several hours, but now it's generated in less than a minute.So I doubt, is this site (neptune.ml) cached?

jakubczakon commented 6 years ago

Well metadata generation should be pretty fast it's only filepath munging. Also since you are generating masks with erosion 2 and dilation 3 your path is actually file_path_mask_eroded_2_dilated_3 if I am correct.

dslwz2008 commented 6 years ago

image

This is the head of stage1_metadata.csv. No similar column appears.

jakubczakon commented 6 years ago

Did you change the mask_overlayed dir in neptune.yaml to

masks_overlayed_dir: /home/shenshen/Programs/masks_overlayed/

and recreated the metadata ? Can you remove stage1_metadata.csv and run it again?

dslwz2008 commented 6 years ago

Yes, I have changed the masks_overlayed_dir in the neptune.yaml and delete stage1_metadata.csv. Then I ran the prepare_metadata again. The result is the same as in the picture above.

jakubczakon commented 6 years ago

what does this folder /home/shenshen/Programs/masks_overlayed/ contain ?

dslwz2008 commented 6 years ago

image

jakubczakon commented 6 years ago

Okey, I checked on my setup and I actually have folders like:

.../masks_overlayed_eroded_3_dilated_2

So i believe you should change the name of your directory to

../masks_overlayed_eroded_2_dilated_3

and rerun metadata generation and you will be ready to go.

dslwz2008 commented 6 years ago

Still no column _file_path_mask_eroded_2_dilated3. I'm going to carefully analyze the code and try again. Thank you very much.

jakubczakon commented 6 years ago

Ok, cool. But one last try:

Change the folder name by:

mv ../masks_overlayed ../masks_overlayed_eroded_2_dilated_3

But LEAVE the name in the neptune.yaml as:

masks_overlayed_dir: ../masks_overlayed/

Rerun the metadata generation.

I think @XYAskWhy got it to work pretty quickly. Any advice?

dslwz2008 commented 6 years ago

Still not working... This is really a weird thing. How about re-clone this repo and start over again? Which branch do you recommend? @XYAskWhy How did you get it to work?

XYAskWhy commented 6 years ago

Got the local training running using the older master version, but still struggling with evaluating/predicting. The updated master version should be OK as well. @dslwz2008

jakubczakon commented 6 years ago

Master should work, dev too as i am generating final predictions with it right now. @XYAskWhy are you running unet_padding_tta ? Check evaluate checkpoint.py script to see how to add missing transformers (just run touch transformer_name in transformers dir)

XYAskWhy commented 6 years ago

@jakubczakon Thanks for the tip. When I run python main.py -- evaluate --pipeline_name unet_padded_tta --chunk_size 200, it dose raise error as the following:

(pytorch0.3) rs@rsLab:/media/rs/3EBAC1C7BAC17BC1/Xavier/Segmentation/open-solution-mapping-challenge$ python main.py -- evaluate --pipeline_name unet_padded_tta --chunk_size 200 neptune: Executing in Offline Mode. neptune: Executing in Offline Mode. 2018-06-02 21-48-05 mapping-challenge >>> evaluating /home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py:895: DtypeWarning: Columns (6,7) have mixed types. Specify dtype option on import or set low_memory=False. return ctx.invoke(self.callback, ctx.params) neptune: Executing in Offline Mode. 0%| | 0/5 [00:00<?, ?it/s]2018-06-02 21-48-13 steps >>> step xy_inference adapting inputs 2018-06-02 21-48-13 steps >>> step xy_inference loading transformer... 2018-06-02 21-48-13 steps >>> step xy_inference transforming... 2018-06-02 21-48-13 steps >>> step xy_inference adapting inputs 2018-06-02 21-48-13 steps >>> step xy_inference loading transformer... 2018-06-02 21-48-13 steps >>> step xy_inference transforming... 2018-06-02 21-48-13 steps >>> step tta_generator adapting inputs Traceback (most recent call last): File "main.py", line 282, in action() File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 722, in call return self.main(args, kwargs) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(args, kwargs) File "main.py", line 117, in evaluate _evaluate(pipeline_name, dev_mode, chunk_size) File "main.py", line 130, in _evaluate prediction = generate_prediction(meta_valid, pipeline, logger, CATEGORY_IDS, chunk_size) File "main.py", line 238, in generate_prediction return _generate_prediction_in_chunks(meta_data, pipeline, logger, category_ids, chunk_size) File "main.py", line 271, in _generate_prediction_in_chunks output = pipeline.transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/Segmentation/open-solution-mapping-challenge/steps/base.py", line 152, in transform step_inputs[input_step.name] = input_step.transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/Segmentation/open-solution-mapping-challenge/steps/base.py", line 152, in transform step_inputs[input_step.name] = input_step.transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/Segmentation/open-solution-mapping-challenge/steps/base.py", line 152, in transform step_inputs[input_step.name] = input_step.transform(data) [Previous line repeated 8 more times] File "/media/rs/3EBAC1C7BAC17BC1/Xavier/Segmentation/open-solution-mapping-challenge/steps/base.py", line 158, in transform step_output_data = self._cached_transform(step_inputs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/Segmentation/open-solution-mapping-challenge/steps/base.py", line 168, in _cached_transform raise ValueError('No transformer cached {}'.format(self.name)) ValueError: No transformer cached tta_generator

I looked into evaluate_checkpoint.py and added 'tta_generator' to MISSING_TRANSFORMERS, but it didn't help. What might be the problem and how do I do? And I don't think I fully understand you commentting 'just run touch transformer_name in transformers dir'.

kamil-kaczmarek commented 6 years ago

@apyskir, @taraspiotr can you provide some help here (check previous message). Thx :)

jakubczakon commented 6 years ago

@XYAskWhy when you run evaluate or predict all transformers for given pipeline need to be persisted in the transformers folder. Since we did not train some pieces of this pipeline at train we need to either run this train on unet_padded_tta (don't advise to but could do with --dev_mode) or simply create those transformers by going touch PATH/TO/TRANSFORMER/DIR/TRANSFORMER_NAME just as I did in the evaluate_checkpoint.py .

This issue will be solved by having is_trainable flag in Step constructor but for now you need to persist all transformers (trainable or not).

jakubczakon commented 6 years ago

@XYAskWhy the is_trainable was added in #142 and is now on master

XYAskWhy commented 6 years ago

@jakubczakon Thanks! Will check out soon.

XYAskWhy commented 6 years ago

@jakubczakon Thanks for the update. But the README still doesn't include the prepare_masks step, which is necessary, right? And the script has been running for two days since I executed it with python main.py -- prepare_masks, does this seem normal to you?

jakubczakon commented 6 years ago

@XYAskWhy it is included in the readme now. Thanks for spotting that. When it comes to performance it depends on the number of workers (and threads) you are using.

Aayushktyagi commented 5 years ago

@jakubczakon is there a shorter way to evaluate model on test image/images or we have to prepare masks first which is time consuming. Thanks

jakubczakon commented 5 years ago

Well, you can simply use predict_on_dir which takes a directory of images as input:

python main.py predict_on_dir \
--pipeline_name unet_tta_scoring_model \
--chunk_size 1000 \
--dir_path path/to/inference_directory \
--prediction_path path/to/predictions.json

That will get you predicted segmentation masks which you can later plot by using results exploration notebook.

It is not a proper evaluation but it is definitely quicker.