How to see the output after training

aasrith9988 commented 3 years ago

I have successfully trained the model with consep dataset and it showed me the accuracy and other stuff while running , but i am not understanding how to get the output and prediction after training , and when i run the python script run_infer.py, i am not able to see anything , instead its giving me options and not showing anything

simongraham commented 3 years ago

Hi @aasrith9988 ,

If you are only seeing the options, then I suspect that you are not inputting the command line arguments correctly. Please can you send a screenshot of the terminal output, where we can see what you type in to the command line and also the corresponding output.

proever commented 3 years ago

I ran into this as well, the trick was to add the argument tile or wsi after specifying the general script options (like --model_mode, etc) and before specifying the tile or WSI processing options (like --input_dir).

vqdang commented 3 years ago

I will leave this here in case you still have trouble running inference. There are 2 samples running script using all cmd arguments for each running mode (tile or wsi)

https://github.com/vqdang/hover_net/blob/master/run_tile.sh https://github.com/vqdang/hover_net/blob/master/run_wsi.sh

aasrith9988 commented 3 years ago

(nseg) kmit@kmit-DGX-Station:~/aasrith/nseg/hover_net-master$ ./run_tile.sh |2021-04-29|10:54:48.834| [INFO] .... Detect #GPUS: 1 WARNING: Detect checkpoint saved in data-parallel mode. Converting saved model to single GPU mode. Traceback (most recent call last): File "run_infer.py", line 180, in infer = InferManager(**method_args) File "/home/kmit/aasrith/nseg/hover_net-master/infer/base.py", line 27, in init self.load_model() File "/home/kmit/aasrith/nseg/hover_net-master/infer/base.py", line 68, in load_model net.load_state_dict(saved_state_dict, strict=True) File "/home/kmit/aasrith/nseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1044, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for HoVerNet: size mismatch for decoder.tp.u0.conv.weight: copying a param with shape torch.Size([5, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([6, 64, 1, 1]). size mismatch for decoder.tp.u0.conv.bias: copying a param with shape torch.Size([5]) from checkpoint, the shape in current model is torch.Size([6]).

The following is the script of run_tile.sh python run_infer.py \ --gpu='3' \ --nr_types=6 \ --type_info_path=/home/kmit/aasrith/nseg/hover_net-master/type_info.json \ --batch_size=64 \ --model_mode=original \ --model_path=/home/kmit/aasrith/nseg/hover_net-master/logs/01/net_epoch=50.tar \ --nr_inference_workers=8 \ --nr_post_proc_workers=16 \ tile \ --input_dir=/kmit/aasrith/nseg/consep/CoNSeP/Test/Images/ \ --output_dir=/home/kmit/aasrith/nseg/prediction/ \ --mem_usage=0.1 \ --draw_dot \ --save_qupath

aasrith9988 commented 3 years ago

/aasrith/nseg/hover_net-master$ python run_infer.py --gpu="3"--nr_types=5 --model_path=/home/kmit/aasrith/nseg/hover_net-master/logs/01/net_epoch=50.tar --model_mode=original tile --input_dir=/kmit/aasrith/nseg/consep/CoNSeP/Test/Images --output_dir=/home/kmit/aasrith/nseg/prediction/ |2021-04-29|10:57:08.547| [INFO] .... Detect #GPUS: 1 WARNING: Detect checkpoint saved in data-parallel mode. Converting saved model to single GPU mode. Traceback (most recent call last): File "run_infer.py", line 180, in infer = InferManager(**method_args) File "/home/kmit/aasrith/nseg/hover_net-master/infer/base.py", line 27, in init self.load_model() File "/home/kmit/aasrith/nseg/hover_net-master/infer/base.py", line 68, in load_model net.load_state_dict(saved_state_dict, strict=True) File "/home/kmit/aasrith/nseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1044, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for HoVerNet: Unexpected key(s) in state_dict: "decoder.tp.u3.conva.weight", "decoder.tp.u3.dense.units.0.preact_bna/bn.weight", "decoder.tp.u3.dense.units.0.preact_bna/bn.bias", "decoder.tp.u3.dense.units.0.preact_bna/bn.running_mean", "decoder.tp.u3.dense.units.0.preact_bna/bn.running_var", "decoder.tp.u3.dense.units.0.preact_bna/bn.num_batches_tracked", "decoder.tp.u3.dense.units.0.conv1.weight", "decoder.tp.u3.dense.units.0.conv1/bn.weight", "decoder.tp.u3.dense.units.0.conv1/bn.bias", "decoder.tp.u3.dense.units.0.conv1/bn.running_mean", "decoder.tp.u3.dense.units.0.conv1/bn.running_var", "decoder.tp.u3.dense.units.0.conv1/bn.num_batches_tracked", "decoder.tp.u3.dense.units.0.conv2.weight", "decoder.tp.u3.dense.units.1.preact_bna/bn.weight", "decoder.tp.u3.dense.units.1.preact_bna/bn.bias", "decoder.tp.u3.dense.units.1.preact_bna/bn.running_mean", "decoder.tp.u3.dense.units.1.preact_bna/bn.running_var", "decoder.tp.u3.dense.units.1.preact_bna/bn.num_batches_tracked", "decoder.tp.u3.dense.units.1.conv1.weight", "decoder.tp.u3.dense.units.1.conv1/bn.weight", "decoder.tp.u3.dense.units.1.conv1/bn.bias", "decoder.tp.u3.dense.units.1.conv1/bn.running_mean", "decoder.tp.u3.dense.units.1.conv1/bn.running_var", "decoder.tp.u3.dense.units.1.conv1/bn.num_batches_tracked", "decoder.tp.u3.dense.units.1.conv2.weight", "decoder.tp.u3.dense.units.2.preact_bna/bn.weight", "decoder.tp.u3.dense.units.2.preact_bna/bn.bias", "decoder.tp.u3.dense.units.2.preact_bna/bn.running_mean", "decoder.tp.u3.dense.units.2.preact_bna/bn.running_var", "decoder.tp.u3.dense.units.2.preact_bna/bn.num_batches_tracked", "decoder.tp.u3.dense.units.2.conv1.weight", "decoder.tp.u3.dense.units.2.conv1/bn.weight", "decoder.tp.u3.dense.units.2.conv1/bn.bias", "decoder.tp.u3.dense.units.2.conv1/bn.running_mean", "decoder.tp.u3.dense.units.2.conv1/bn.running_var", "decoder.tp.u3.dense.units.2.conv1/bn.num_batches_tracked", "decoder.tp.u3.dense.units.2.conv2.weight", "decoder.tp.u3.dense.units.3.preact_bna/bn.weight", "decoder.tp.u3.dense.units.3.preact_bna/bn.bias", "decoder.tp.u3.dense.units.3.preact_bna/bn.running_mean", "decoder.tp.u3.dense.units.3.preact_bna/bn.running_var", "decoder.tp.u3.dense.units.3.preact_bna/bn.num_batches_tracked", "decoder.tp.u3.dense.units.3.conv1.weight", "decoder.tp.u3.dense.units.3.conv1/bn.weight", "decoder.tp.u3.dense.units.3.conv1/bn.bias", "decoder.tp.u3.dense.units.3.conv1/bn.running_mean", "decoder.tp.u3.dense.units.3.conv1/bn.running_var", "decoder.tp.u3.dense.units.3.conv1/bn.num_batches_tracked", "decoder.tp.u3.dense.units.3.conv2.weight", "decoder.tp.u3.dense.units.4.preact_bna/bn.weight", "decoder.tp.u3.dense.units.4.preact_bna/bn.bias", "decoder.tp.u3.dense.units.4.preact_bna/bn.running_mean", "decoder.tp.u3.dense.units.4.preact_bna/bn.running_var", "decoder.tp.u3.dense.units.4.preact_bna/bn.num_batches_tracked", "decoder.tp.u3.dense.units.4.conv1.weight", "decoder.tp.u3.dense.units.4.conv1/bn.weight", "decoder.tp.u3.dense.units.4.conv1/bn.bias", "decoder.tp.u3.dense.units.4.conv1/bn.running_mean", "decoder.tp.u3.dense.units.4.conv1/bn.running_var", "decoder.tp.u3.dense.units.4.conv1/bn.num_batches_tracked", "decoder.tp.u3.dense.units.4.conv2.weight", "decoder.tp.u3.dense.units.5.preact_bna/bn.weight", "decoder.tp.u3.dense.units.5.preact_bna/bn.bias", "decoder.tp.u3.dense.units.5.preact_bna/bn.running_mean", "decoder.tp.u3.dense.units.5.preact_bna/bn.running_var", "decoder.tp.u3.dense.units.5.preact_bna/bn.num_batches_tracked", "decoder.tp.u3.dense.units.5.conv1.weight", "decoder.tp.u3.dense.units.5.conv1/bn.weight", "decoder.tp.u3.dense.units.5.conv1/bn.bias", "decoder.tp.u3.dense.units.5.conv1/bn.running_mean", "decoder.tp.u3.dense.units.5.conv1/bn.running_var", "decoder.tp.u3.dense.units.5.conv1/bn.num_batches_tracked", "decoder.tp.u3.dense.units.5.conv2.weight", "decoder.tp.u3.dense.units.6.preact_bna/bn.weight", "decoder.tp.u3.dense.units.6.preact_bna/bn.bias", "decoder.tp.u3.dense.units.6.preact_bna/bn.running_mean", "decoder.tp.u3.dense.units.6.preact_bna/bn.running_var", "decoder.tp.u3.dense.units.6.preact_bna/bn.num_batches_tracked", "decoder.tp.u3.dense.units.6.conv1.weight", "decoder.tp.u3.dense.units.6.conv1/bn.weight", "decoder.tp.u3.dense.units.6.conv1/bn.bias", "decoder.tp.u3.dense.units.6.conv1/bn.running_mean", "decoder.tp.u3.dense.units.6.conv1/bn.running_var", "decoder.tp.u3.dense.units.6.conv1/bn.num_batches_tracked", "decoder.tp.u3.dense.units.6.conv2.weight", "decoder.tp.u3.dense.units.7.preact_bna/bn.weight", "decoder.tp.u3.dense.units.7.preact_bna/bn.bias", "decoder.tp.u3.dense.units.7.preact_bna/bn.running_mean", "decoder.tp.u3.dense.units.7.preact_bna/bn.running_var", "decoder.tp.u3.dense.units.7.preact_bna/bn.num_batches_tracked", "decoder.tp.u3.dense.units.7.conv1.weight", "decoder.tp.u3.dense.units.7.conv1/bn.weight", "decoder.tp.u3.dense.units.7.conv1/bn.bias", "decoder.tp.u3.dense.units.7.conv1/bn.running_mean", "decoder.tp.u3.dense.units.7.conv1/bn.running_var", "decoder.tp.u3.dense.units.7.conv1/bn.num_batches_tracked", "decoder.tp.u3.dense.units.7.conv2.weight", "decoder.tp.u3.dense.blk_bna.bn.weight", "decoder.tp.u3.dense.blk_bna.bn.bias", "decoder.tp.u3.dense.blk_bna.bn.running_mean", "decoder.tp.u3.dense.blk_bna.bn.running_var", "decoder.tp.u3.dense.blk_bna.bn.num_batches_tracked", "decoder.tp.u3.convf.weight", "decoder.tp.u2.conva.weight", "decoder.tp.u2.dense.units.0.preact_bna/bn.weight", "decoder.tp.u2.dense.units.0.preact_bna/bn.bias", "decoder.tp.u2.dense.units.0.preact_bna/bn.running_mean", "decoder.tp.u2.dense.units.0.preact_bna/bn.running_var", "decoder.tp.u2.dense.units.0.preact_bna/bn.num_batches_tracked", "decoder.tp.u2.dense.units.0.conv1.weight", "decoder.tp.u2.dense.units.0.conv1/bn.weight", "decoder.tp.u2.dense.units.0.conv1/bn.bias", "decoder.tp.u2.dense.units.0.conv1/bn.running_mean", "decoder.tp.u2.dense.units.0.conv1/bn.running_var", "decoder.tp.u2.dense.units.0.conv1/bn.num_batches_tracked", "decoder.tp.u2.dense.units.0.conv2.weight", "decoder.tp.u2.dense.units.1.preact_bna/bn.weight", "decoder.tp.u2.dense.units.1.preact_bna/bn.bias", "decoder.tp.u2.dense.units.1.preact_bna/bn.running_mean", "decoder.tp.u2.dense.units.1.preact_bna/bn.running_var", "decoder.tp.u2.dense.units.1.preact_bna/bn.num_batches_tracked", "decoder.tp.u2.dense.units.1.conv1.weight", "decoder.tp.u2.dense.units.1.conv1/bn.weight", "decoder.tp.u2.dense.units.1.conv1/bn.bias", "decoder.tp.u2.dense.units.1.conv1/bn.running_mean", "decoder.tp.u2.dense.units.1.conv1/bn.running_var", "decoder.tp.u2.dense.units.1.conv1/bn.num_batches_tracked", "decoder.tp.u2.dense.units.1.conv2.weight", "decoder.tp.u2.dense.units.2.preact_bna/bn.weight", "decoder.tp.u2.dense.units.2.preact_bna/bn.bias", "decoder.tp.u2.dense.units.2.preact_bna/bn.running_mean", "decoder.tp.u2.dense.units.2.preact_bna/bn.running_var", "decoder.tp.u2.dense.units.2.preact_bna/bn.num_batches_tracked", "decoder.tp.u2.dense.units.2.conv1.weight", "decoder.tp.u2.dense.units.2.conv1/bn.weight", "decoder.tp.u2.dense.units.2.conv1/bn.bias", "decoder.tp.u2.dense.units.2.conv1/bn.running_mean", "decoder.tp.u2.dense.units.2.conv1/bn.running_var", "decoder.tp.u2.dense.units.2.conv1/bn.num_batches_tracked", "decoder.tp.u2.dense.units.2.conv2.weight", "decoder.tp.u2.dense.units.3.preact_bna/bn.weight", "decoder.tp.u2.dense.units.3.preact_bna/bn.bias", "decoder.tp.u2.dense.units.3.preact_bna/bn.running_mean", "decoder.tp.u2.dense.units.3.preact_bna/bn.running_var", "decoder.tp.u2.dense.units.3.preact_bna/bn.num_batches_tracked", "decoder.tp.u2.dense.units.3.conv1.weight", "decoder.tp.u2.dense.units.3.conv1/bn.weight", "decoder.tp.u2.dense.units.3.conv1/bn.bias", "decoder.tp.u2.dense.units.3.conv1/bn.running_mean", "decoder.tp.u2.dense.units.3.conv1/bn.running_var", "decoder.tp.u2.dense.units.3.conv1/bn.num_batches_tracked", "decoder.tp.u2.dense.units.3.conv2.weight", "decoder.tp.u2.dense.blk_bna.bn.weight", "decoder.tp.u2.dense.blk_bna.bn.bias", "decoder.tp.u2.dense.blk_bna.bn.running_mean", "decoder.tp.u2.dense.blk_bna.bn.running_var", "decoder.tp.u2.dense.blk_bna.bn.num_batches_tracked", "decoder.tp.u2.convf.weight", "decoder.tp.u1.conva.weight", "decoder.tp.u0.bn.weight", "decoder.tp.u0.bn.bias", "decoder.tp.u0.bn.running_mean", "decoder.tp.u0.bn.running_var", "decoder.tp.u0.bn.num_batches_tracked", "decoder.tp.u0.conv.weight", "decoder.tp.u0.conv.bias".

aasrith9988 commented 3 years ago

@simongraham these were the errors that i was facing while running it and also as @proever mentioned in issue#116 i have also made the changes in the files run_infer.py and viz_utils.py , still i am facing these errors

vqdang commented 3 years ago

@aasrith9988 Do you want to use the model for type classification? From the error, you are loading a checkpoint missing the typing branches. To use the model without typing, you have to set --nr_types=6 in the script to 0.

And to follow up on this, https://github.com/vqdang/hover_net/issues/115#issuecomment-828948693 You are declaring a model to have 6 types but the checkpoint only has 5 types. In case you are training a model yourself, if yours has 3 nuclei types, such as connective, inflammatory and epithelial, the model will require 4 types to be declared (the addtional ones being the background i.e not a nuclei).

aasrith9988 commented 3 years ago

@vqdang sir i am still getting the same error when the changes were made as directed by you and when i am trying to visualise the training data i am getting the following error (hovernet) kmit@kmit-DGX-Station:~/aasrith/nseg/hover_net-master$ python run_train.py --view='train' Using manual seed: 10 Dataset train: 1323 Traceback (most recent call last): File "run_train.py", line 302, in trainer.view_dataset(args["--view"]) File "run_train.py", line 90, in view_dataset viz = prep_func(batch_data, is_batch=True, *prep_kwargs) File "/home/kmit/aasrith/nseg/hover_net-master/models/hovernet/targets.py", line 150, in prep_sample viz_list.append(prep_one_sample(sub_data)) File "/home/kmit/aasrith/nseg/hover_net-master/models/hovernet/targets.py", line 134, in prep_one_sample shape = np.maximum(shape_array) ValueError: invalid number of arguments

aasrith9988 commented 3 years ago

and i am using nvidia telsa v-100 gpu using ssh and training , my pc is using ubuntu 20.04 and the server is also ubuntu

weihaopan commented 3 years ago

I am having this issue

vqdang commented 3 years ago

@aasrith9988 Your issue with the checkpoint has been going on for a while now and I don't have enough detail to figure out the problem. If possible, you can share with me your weight here. I will test run it locally and I will provide a solution here later once I have a better idea what is your problem.

aasrith9988 commented 3 years ago

@vqdang sir , here's the drive link of weights file https://drive.google.com/file/d/1XxmgAInJ9ZO5bwrQRKlNrfIqFq7nkqws/view?usp=sharing sorry for the late reply

vqdang commented 3 years ago

@aasrith9988 use the following command with the current repos as on github. Notive that the model_mode is original rather than fast as provided in the sample script.

python run_infer.py \
--gpu='0,1,2,3' \
--nr_types=5 \
--type_info_path=type_info.json \
--batch_size=32 \
--model_mode='original' \
--model_path=assrith_hovernet.tar \
--nr_inference_workers=4 \
--nr_post_proc_workers=4 \
tile \
--input_dir=exp_output/sample/imgs/ \
--output_dir=exp_output/sample/pred/ \
--draw_dot \
--save_qupath

Sample result on Kumar. TCGA-18-5592-01Z-00-DX1

aasrith9988 commented 3 years ago

@vqdang sir it worked , thank you very much for your help and swift response

proever commented 3 years ago

@vqdang sir i am still getting the same error when the changes were made as directed by you and when i am trying to visualise the training data i am getting the following error (hovernet) kmit@kmit-DGX-Station:~/aasrith/nseg/hover_net-master$ python run_train.py --view='train' Using manual seed: 10 Dataset train: 1323 Traceback (most recent call last): File "run_train.py", line 302, in trainer.view_dataset(args["--view"]) File "run_train.py", line 90, in view_dataset viz = prep_func(batch_data, is_batch=True, *prep_kwargs) File "/home/kmit/aasrith/nseg/hover_net-master/models/hovernet/targets.py", line 150, in prep_sample viz_list.append(prep_one_sample(sub_data)) File "/home/kmit/aasrith/nseg/hover_net-master/models/hovernet/targets.py", line 134, in prep_one_sample shape = np.maximum(shape_array) ValueError: invalid number of arguments

did you ever resolve this error? I'm also seeing it.

> shape = np.maximum(*shape_array)
> ValueError: invalid number of arguments

vqdang commented 3 years ago

@proever This error happened to him due to wrong command line argument, refer to my comment to see what you may need to check over for your model and prediction.

proever commented 3 years ago

hmm, the command I ran was also python run_train.py --view='train', and as far as I can tell everything is as close as possible to the state of the codebase in master. I ran extract_patches.py on my copy of the CoNSeP dataset beforehand.

proever commented 3 years ago

FWIW I was able to fix the issue by changing the line to

shape = np.maximum.reduce([*shape_array])

But I'm not quite sure if that's a "correct" solution.

vqdang commented 3 years ago

--view='train' functionality may be broken btw as I havent maintained it for a while.

proever commented 3 years ago

Ok, good to know. Thank you!

vqdang / hover_net

How to see the output after training #115