Open Christophe-pere opened 4 years ago
After following the advice of the issue #210 the evaluation and prediction are right. But, I tried after that to train again the model of the first issue bye overwriting the previous model as #160 explained. The training is gonna be right except finishing with the same error as previously shown. But when I ran the evaluation I got this issue :
2020-02-19 14-02-30 mapping-challenge >>> evaluating
2020-02-19 14-02-37 steps >>> step xy_inference adapting inputs
2020-02-19 14-02-37 steps >>> step xy_inference transforming...
2020-02-19 14-02-37 steps >>> step xy_inference adapting inputs
2020-02-19 14-02-37 steps >>> step xy_inference transforming...
2020-02-19 14-02-37 steps >>> step loader adapting inputs
2020-02-19 14-02-37 steps >>> step loader transforming...
2020-02-19 14-02-37 steps >>> step unet unpacking inputs
2020-02-19 14-02-37 steps >>> step unet loading transformer...
Traceback (most recent call last):
File "main.py", line 68, in <module>
main()
File "/home/open-solution-mapping-challenge/mapping/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/open-solution-mapping-challenge/mapping/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/open-solution-mapping-challenge/mapping/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/open-solution-mapping-challenge/mapping/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/open-solution-mapping-challenge/mapping/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "main.py", line 41, in evaluate
pipeline_manager.evaluate(pipeline_name, dev_mode, chunk_size)
File "/home/open-solution-mapping-challenge/src/pipeline_manager.py", line 52, in evaluate
evaluate(pipeline_name, dev_mode, chunk_size, self.logger, self.params, self.seed)
File "/home/open-solution-mapping-challenge/src/pipeline_manager.py", line 152, in evaluate
prediction = generate_prediction(meta_valid, pipeline, logger, CATEGORY_IDS, chunk_size, params.num_threads)
File "/home/open-solution-mapping-challenge/src/pipeline_manager.py", line 190, in generate_prediction
return _generate_prediction(meta_data, pipeline, logger, category_ids, num_threads)
File "/home/open-solution-mapping-challenge/src/pipeline_manager.py", line 203, in _generate_prediction
output = pipeline.transform(data)
File "/home/open-solution-mapping-challenge/src/steps/base.py", line 158, in transform
step_inputs[input_step.name] = input_step.transform(data)
File "/home/open-solution-mapping-challenge/src/steps/base.py", line 158, in transform
step_inputs[input_step.name] = input_step.transform(data)
File "/home/open-solution-mapping-challenge/src/steps/base.py", line 158, in transform
step_inputs[input_step.name] = input_step.transform(data)
[Previous line repeated 4 more times]
File "/home/open-solution-mapping-challenge/src/steps/base.py", line 164, in transform
return self._cached_transform(step_inputs)
File "/home/open-solution-mapping-challenge/src/steps/base.py", line 170, in _cached_transform
self.transformer.load(self.cache_filepath_step_transformer)
File "/home/open-solution-mapping-challenge/src/steps/pytorch/models.py", line 156, in load
self.model.load_state_dict(torch.load(filepath))
File "/home/open-solution-mapping-challenge/mapping/lib/python3.6/site-packages/torch/nn/modules/module.py", line 522, in load_state_dict
.format(name))
KeyError: 'unexpected key "module.module.encoder.conv1.weight" in state_dict'
How can correct this ?
Best,
Chris
Hi @Christophe-pere and apologies for taking this long to answer (was unwatched from this repo for some reason).
The loading issue is some pickling problem connected to running it on multigpu (not 100% sure but likely).
You can fix it by overriding how the model gets loaded in here: https://github.com/neptune-ai/open-solution-mapping-challenge/blob/master/src/models.py
def fit(self, datagen, validation_datagen=None, meta_valid=None):
self._initialize_model_weights()
self.model = nn.DataParallel(self.model)
I hope this helps!
Hi there,
When I launch the command :
python3 main.py train --pipeline_name unet_weighted
I have this issue :
I tried the code on only one epoch to see how it works.
The code is running on a VM Ubuntu 18.04 with 2 GPUs.
How can I correct the code to generate the transformers layer ?
Best,
Chris