neptune-ai / open-solution-mapping-challenge

Open solution to the Mapping Challenge :earth_americas:
https://www.crowdai.org/challenges/mapping-challenge
MIT License
380 stars 96 forks source link

predicting/evaluating issue #123

Open XYAskWhy opened 6 years ago

XYAskWhy commented 6 years ago

When predicting or evaluating with python main.py -- predict(evaluate) --pipeline_name unet --chunk_size 5000, the following error occurs:

neptune: Executing in Offline Mode. neptune: Executing in Offline Mode. 2018-05-30 21-01-50 mapping-challenge >>> predicting /home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py:895: DtypeWarning: Columns (6,7) have mixed types. Specify dtype option on import or set low_memory=False. return ctx.invoke(self.callback, ctx.params) neptune: Executing in Offline Mode. 0%| | 0/13 [00:00<?, ?it/s]2018-05-30 21-01-56 steps >>> step xy_inference adapting inputs 2018-05-30 21-01-56 steps >>> step xy_inference loading transformer... 2018-05-30 21-01-56 steps >>> step xy_inference transforming... 2018-05-30 21-01-56 steps >>> step xy_inference adapting inputs 2018-05-30 21-01-56 steps >>> step xy_inference loading transformer... 2018-05-30 21-01-56 steps >>> step xy_inference transforming... 2018-05-30 21-01-56 steps >>> step loader adapting inputs 2018-05-30 21-01-56 steps >>> step loader loading transformer... 2018-05-30 21-01-56 steps >>> step loader transforming... 2018-05-30 21-01-56 steps >>> step unet unpacking inputs 2018-05-30 21-01-56 steps >>> step unet loading transformer... 2018-05-30 21-01-58 steps >>> step unet transforming... 2018-05-30 21-01-58 steps >>> step mask_resize adapting inputs 2018-05-30 21-01-58 steps >>> step mask_resize loading transformer... 2018-05-30 21-01-58 steps >>> step mask_resize transforming... 2018-05-30 21-01-58 steps >>> step category_mapper adapting inputs 2018-05-30 21-01-58 steps >>> step category_mapper loading transformer... 2018-05-30 21-01-58 steps >>> step category_mapper transforming... 2018-05-30 21-01-58 steps >>> step mask_erosion adapting inputs 2018-05-30 21-01-58 steps >>> step mask_erosion loading transformer... 2018-05-30 21-01-58 steps >>> step mask_erosion transforming... 2018-05-30 21-01-58 steps >>> step labeler adapting inputs 2018-05-30 21-01-58 steps >>> step labeler loading transformer... 2018-05-30 21-01-58 steps >>> step labeler transforming... 2018-05-30 21-01-58 steps >>> step mask_dilation adapting inputs 2018-05-30 21-01-58 steps >>> step mask_dilation loading transformer... 2018-05-30 21-01-58 steps >>> step mask_dilation transforming... 2018-05-30 21-01-59 steps >>> step xy_inference adapting inputs 2018-05-30 21-01-59 steps >>> step xy_inference loading transformer... 2018-05-30 21-01-59 steps >>> step xy_inference transforming... 2018-05-30 21-01-59 steps >>> step xy_inference adapting inputs 2018-05-30 21-01-59 steps >>> step xy_inference loading transformer... 2018-05-30 21-01-59 steps >>> step xy_inference transforming... 2018-05-30 21-01-59 steps >>> step loader adapting inputs 2018-05-30 21-01-59 steps >>> step loader loading transformer... 2018-05-30 21-01-59 steps >>> step loader transforming... 2018-05-30 21-01-59 steps >>> step unet unpacking inputs 2018-05-30 21-01-59 steps >>> step unet loading transformer... 2018-05-30 21-01-59 steps >>> step unet transforming... 2018-05-30 21-01-59 steps >>> step mask_resize adapting inputs 2018-05-30 21-01-59 steps >>> step mask_resize loading transformer... 2018-05-30 21-01-59 steps >>> step mask_resize transforming... 2018-05-30 21-01-59 steps >>> step score_builder adapting inputs 2018-05-30 21-01-59 steps >>> step score_builder fitting and transforming... 2018-05-30 21-08-33 steps >>> step score_builder saving transformer... 2018-05-30 21-08-33 steps >>> step output adapting inputs Traceback (most recent call last): File "main.py", line 282, in action()4, 12.69it/s] File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 722, in call return self.main(args, kwargs) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(args, kwargs) File "main.py", line 158, in predict _predict(pipeline_name, dev_mode, submit_predictions, chunk_size) File "main.py", line 169, in _predict prediction = generate_prediction(meta_test, pipeline, logger, CATEGORY_IDS, chunk_size) File "main.py", line 238, in generate_prediction return _generate_prediction_in_chunks(meta_data, pipeline, logger, category_ids, chunk_size) File "main.py", line 271, in _generate_prediction_in_chunks output = pipeline.transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 155, in transform step_inputs = self.adapt(step_inputs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 192, in adapt raw_inputs = [step_inputs[step_name][step_var] for step_name, step_var in step_mapping] File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 192, in raw_inputs = [step_inputs[step_name][step_var] for step_name, step_var in step_mapping] KeyError: 'images'

The error above may be caused by using --chunk_size 5000, since the program crashes exactly after 5000 iterations(?). But even if I don't specify chunk_size and just run python main.py -- predict --pipeline_name unet, another error occurs, which is the same error when I simply run python main.py -- train_evaluate_predict --pipeline_name unet --chunk_size 5000 as ReadMe suggests.

neptune: Executing in Offline Mode. neptune: Executing in Offline Mode. 2018-05-30 21-45-10 mapping-challenge >>> predicting /home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py:895: DtypeWarning: Columns (6,7) have mixed types. Specify dtype option on import or set low_memory=False. return ctx.invoke(self.callback, ctx.params) neptune: Executing in Offline Mode. 2018-05-30 21-45-14 steps >>> step xy_inference adapting inputs 2018-05-30 21-45-14 steps >>> step xy_inference loading transformer... 2018-05-30 21-45-14 steps >>> step xy_inference transforming... 2018-05-30 21-45-14 steps >>> step xy_inference adapting inputs 2018-05-30 21-45-14 steps >>> step xy_inference loading transformer... 2018-05-30 21-45-14 steps >>> step xy_inference transforming... 2018-05-30 21-45-14 steps >>> step loader adapting inputs 2018-05-30 21-45-14 steps >>> step loader loading transformer... 2018-05-30 21-45-14 steps >>> step loader transforming... 2018-05-30 21-45-14 steps >>> step unet unpacking inputs 2018-05-30 21-45-14 steps >>> step unet loading transformer... 2018-05-30 21-45-17 steps >>> step unet transforming... 2018-05-30 21-45-17 steps >>> step mask_resize adapting inputs 2018-05-30 21-45-17 steps >>> step mask_resize loading transformer... 2018-05-30 21-45-17 steps >>> step mask_resize transforming... 2018-05-30 21-45-17 steps >>> step category_mapper adapting inputs 2018-05-30 21-45-17 steps >>> step category_mapper loading transformer... 2018-05-30 21-45-17 steps >>> step category_mapper transforming... 2018-05-30 21-45-17 steps >>> step mask_erosion adapting inputs 2018-05-30 21-45-17 steps >>> step mask_erosion loading transformer... 2018-05-30 21-45-17 steps >>> step mask_erosion transforming... 2018-05-30 21-45-17 steps >>> step labeler adapting inputs 2018-05-30 21-45-17 steps >>> step labeler loading transformer... 2018-05-30 21-45-17 steps >>> step labeler transforming... 2018-05-30 21-45-17 steps >>> step mask_dilation adapting inputs 2018-05-30 21-45-17 steps >>> step mask_dilation loading transformer... 2018-05-30 21-45-17 steps >>> step mask_dilation transforming... 2018-05-30 21-45-17 steps >>> step xy_inference adapting inputs 2018-05-30 21-45-17 steps >>> step xy_inference loading transformer... 2018-05-30 21-45-17 steps >>> step xy_inference transforming... 2018-05-30 21-45-17 steps >>> step xy_inference adapting inputs 2018-05-30 21-45-17 steps >>> step xy_inference loading transformer... 2018-05-30 21-45-17 steps >>> step xy_inference transforming... 2018-05-30 21-45-17 steps >>> step loader adapting inputs 2018-05-30 21-45-17 steps >>> step loader loading transformer... 2018-05-30 21-45-17 steps >>> step loader transforming... 2018-05-30 21-45-17 steps >>> step unet unpacking inputs 2018-05-30 21-45-17 steps >>> step unet loading transformer... 2018-05-30 21-45-17 steps >>> step unet transforming... 2018-05-30 21-45-17 steps >>> step mask_resize adapting inputs 2018-05-30 21-45-17 steps >>> step mask_resize loading transformer... 2018-05-30 21-45-17 steps >>> step mask_resize transforming... 2018-05-30 21-45-17 steps >>> step score_builder adapting inputs 2018-05-30 21-45-17 steps >>> step score_builder loading transformer... 2018-05-30 21-45-17 steps >>> step score_builder transforming... Traceback (most recent call last): File "main.py", line 282, in action()4:40, 12.93it/s] File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 722, in call return self.main(args, kwargs) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(args, kwargs) File "main.py", line 158, in predict _predict(pipeline_name, dev_mode, submit_predictions, chunk_size) File "main.py", line 169, in _predict prediction = generate_prediction(meta_test, pipeline, logger, CATEGORY_IDS, chunk_size) File "main.py", line 240, in generate_prediction return _generate_prediction(meta_data, pipeline, logger, category_ids) File "main.py", line 252, in _generate_prediction output = pipeline.transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 152, in transform step_inputs[input_step.name] = input_step.fit_transform(data) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 109, in fit_transform step_output_data = self._cached_fit_transform(step_inputs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/steps/base.py", line 117, in _cached_fit_transform step_output_data = self.transformer.transform(*step_inputs) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/postprocessing.py", line 127, in transform for image, image_probabilities in tqdm(zip(images, probabilities)): File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/tqdm/_tqdm.py", line 941, in iter for obj in iterable: File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/postprocessing.py", line 200, in _transform for image in tqdm(images): File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/tqdm/_tqdm.py", line 941, in iter for obj in iterable: File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/postprocessing.py", line 137, in _transform for i, image in enumerate(images): File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/postprocessing.py", line 174, in _transform yield erode_image(image, self.selem_size) File "/media/rs/3EBAC1C7BAC17BC1/Xavier/crowdAI/open-solution-mapping-challenge/postprocessing.py", line 267, in erode_image eroded_image = binary_erosion(mask, selem=selem) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/skimage/morphology/misc.py", line 37, in func_out return func(image, selem=selem, args, kwargs) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/skimage/morphology/binary.py", line 42, in binary_erosion ndi.binary_erosion(image, structure=selem, output=out) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/scipy/ndimage/morphology.py", line 370, in binary_erosion output, border_value, origin, 0, brute_force) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/scipy/ndimage/morphology.py", line 227, in _binary_erosion if numpy.product(structure.shape,axis=0) < 1: File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1897, in product return um.multiply.reduce(a, axis=axis, dtype=dtype, out=out, kwargs) File "/home/rs/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 175, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 6930) is killed by signal: Killed.

jakubczakon commented 6 years ago

@XYAskWhy the error above is caused by the mistake on our part in the inference mode of unet. We are running always unet_padded and unet_padded_tta in the inference mode and didn't catch that typo. I would suggest that you run evaluate again with the unet_padded on --chunk_size 5000 or unet_padded_tta on smaller chunk size to fit it in memory when combining tta predictions. My advice is to go with --chunk_size 200 with unet_padded_tta as it gives the best results.

XYAskWhy commented 6 years ago

@jakubczakon Many thanks, but training configuration might not be practical, since most mainstream GPUs now have about 10G memory while the 20 images batch only use 2G. As a result, training is very slow. What's your suggestion on larger batch_size and corresponding learning rate?

jakubczakon commented 6 years ago

Very simple just change batch_size_train in the neptune.yaml. you change all other things there too. Including encoder network fron resnet34 to resnet152 or 101, learning rates training schedule and other stuff

jakubczakon commented 6 years ago

You can also train multi gpu. Remember to set num_workers to a higher number because that usually is the bottleneck

hs0531 commented 4 years ago

how can you work Neptune in Offline Mode

jakubczakon commented 4 years ago

Hi @hs0531

You can do something like this:

from neptune import OfflineBackend

neptune.init(backend=OfflineBackend())
...

as [explained here[(https://docs.neptune.ai/neptune-client/docs/neptune.html?highlight=offline).

In that case, nothing will be logged to Neptune -> I use it usually for debugging purposes.

hs0531 commented 4 years ago

thank you

---Original--- From: "Jakub"<notifications@github.com> Date: Wed, May 20, 2020 17:35 PM To: "neptune-ai/open-solution-mapping-challenge"<open-solution-mapping-challenge@noreply.github.com>; Cc: "hs0531"<348580064@qq.com>;"Mention"<mention@noreply.github.com>; Subject: Re: [neptune-ai/open-solution-mapping-challenge] predicting/evaluating issue (#123)

Hi @hs0531

You can do something like this: from neptune import OfflineBackend neptune.init(backend=OfflineBackend()) ...

as [explained here[(https://docs.neptune.ai/neptune-client/docs/neptune.html?highlight=offline).

In that case, nothing will be logged to Neptune -> I use it usually for debugging purposes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

hs0531 commented 4 years ago

thank you ---Original--- From: "Jakub"<notifications@github.com> Date: Wed, May 20, 2020 17:35 PM To: "neptune-ai/open-solution-mapping-challenge"<open-solution-mapping-challenge@noreply.github.com>; Cc: "hs0531"<348580064@qq.com>;"Mention"<mention@noreply.github.com>; Subject: Re: [neptune-ai/open-solution-mapping-challenge] predicting/evaluating issue (#123) Hi @hs0531 You can do something like this: from neptune import OfflineBackend neptune.init(backend=OfflineBackend()) ... as [explained here[(https://docs.neptune.ai/neptune-client/docs/neptune.html?highlight=offline). In that case, nothing will be logged to Neptune -> I use it usually for debugging purposes. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

which version of neptune do you install. i do follow you but get "cannot import name OfflineBackend"

hs0531 commented 4 years ago

Hi @hs0531

You can do something like this:

from neptune import OfflineBackend

neptune.init(backend=OfflineBackend())
...

as [explained here[(https://docs.neptune.ai/neptune-client/docs/neptune.html?highlight=offline).

In that case, nothing will be logged to Neptune -> I use it usually for debugging purposes.

which version of neptune do you install. i do follow you but get "cannot import name OfflineBackend"