javiribera / locating-objects-without-bboxes

PyTorch code for "Locating objects without bounding boxes" - Loss function and trained models
Other
249 stars 52 forks source link

About validating time #18

Closed Acmenwangtuo closed 4 years ago

Acmenwangtuo commented 4 years ago

When I run train.py on my own data, it will cost a long time when validates with with very low gpu untils,I wanna know why and my data is 1000x1000 about thousands of object on each image

javiribera commented 4 years ago

I would like to reproduce this. How many GPUs do you have? And what model? Can we see an example image?

Acmenwangtuo commented 4 years ago

I have one Tesla v100 32G,the data in https://monuseg.grand-challenge.org/Data/

Acmenwangtuo commented 4 years ago

As you see,I want to detect the center of nuclei

javiribera commented 4 years ago

That GPU should be enough. You must have converted the groundtruth of that data to a CSV file that the "locating-objects-without-bboxes" project can read, with a location for each nuclei center. Can you please upload that CSV file somewhere?

Acmenwangtuo commented 4 years ago

Yeah,I have generated the csv file,it at https://drive.google.com/open?id=19TTTPlYZCIHmGglrLzg33Xpi9uMHEAim

javiribera commented 4 years ago

I'm also going to need:

  1. The GT file of the training data. It seems the GT you just sent is only for a subset of the images in https://drive.google.com/file/d/1JZN9Jq9km0rZNiYNEukE_8f0CsSK3Pe4
  2. The command you used to run train.py, so I can reproduce the same hyperparameters.
  3. How slow validation takes to you. How long does it take to validate the entire validation set at the end of an epoch?
Acmenwangtuo commented 4 years ago

Yeah,it actually a part of the dataset,it only has 16 images,the rest data i will use to test,the parameter i use is as same as you provided,except the image size is 1000x1000,it about 8 mintues one image

javiribera commented 4 years ago

I'm still going to need items 1 and 2 from my previous message.

Acmenwangtuo commented 4 years ago

The complete gt.csv is https://drive.google.com/open?id=1CrR2xElG9npVNW_TcIf3-gihHInQC6Hv And the script is python -m object-locator.train --train-dir ./traindata --batch-size 4 --visdom-env mytrainsession --visdom-server localhost --lr 1e-3 --val-dir ./traindata --optim Adam --save saved_new_model.ckpt --imgsize 1000x1000 --val-freq 100 --epochs 200

javiribera commented 4 years ago

I cannot reproduce this error yet because I get an out of memory error when running your command, even setting --batch-size 1. This is probably because my GPU only has 12 GB. Your input image size of 1000x1000 yields a CNN of 125 M parameters (this is shown when you run train.py), which seems pretty large.

How slow is validation if you run it with 256x256 so that I can reproduce it?

Acmenwangtuo commented 4 years ago

Yeah,I have met the same question with you,so I resize the image to 256x256,it still very slow when validate,about a few minutes one images with low recall and accuracy

javiribera commented 4 years ago

Please post the full standard output of your training log to https://pastebin.com/ and let us have a look.

javiribera commented 4 years ago

Closing due to inactivity and lack of info. Feel free to reopen if you show us know the training log.