About validating time - Githubissues

javiribera / locating-objects-without-bboxes

PyTorch code for "Locating objects without bounding boxes" - Loss function and trained models

Other

251 stars 51 forks source link

About validating time #18

Closed Acmenwangtuo closed 4 years ago

Acmenwangtuo commented 4 years ago

When I run train.py on my own data, it will cost a long time when validates with with very low gpu untils,I wanna know why and my data is 1000x1000 about thousands of object on each image

javiribera commented 4 years ago

I would like to reproduce this. How many GPUs do you have? And what model? Can we see an example image?

Acmenwangtuo commented 4 years ago

I have one Tesla v100 32G,the data in https://monuseg.grand-challenge.org/Data/

Acmenwangtuo commented 4 years ago

As you see,I want to detect the center of nuclei

javiribera commented 4 years ago

That GPU should be enough. You must have converted the groundtruth of that data to a CSV file that the "locating-objects-without-bboxes" project can read, with a location for each nuclei center. Can you please upload that CSV file somewhere?

Acmenwangtuo commented 4 years ago

Yeah,I have generated the csv file,it at https://drive.google.com/open?id=19TTTPlYZCIHmGglrLzg33Xpi9uMHEAim

javiribera commented 4 years ago

I'm also going to need:

The GT file of the training data. It seems the GT you just sent is only for a subset of the images in https://drive.google.com/file/d/1JZN9Jq9km0rZNiYNEukE_8f0CsSK3Pe4
The command you used to run train.py, so I can reproduce the same hyperparameters.
How slow validation takes to you. How long does it take to validate the entire validation set at the end of an epoch?

Acmenwangtuo commented 4 years ago

Yeah，it actually a part of the dataset,it only has 16 images,the rest data i will use to test,the parameter i use is as same as you provided,except the image size is 1000x1000,it about 8 mintues one image

javiribera commented 4 years ago

I'm still going to need items 1 and 2 from my previous message.

Acmenwangtuo commented 4 years ago

The complete gt.csv is https://drive.google.com/open?id=1CrR2xElG9npVNW_TcIf3-gihHInQC6Hv And the script is python -m object-locator.train --train-dir ./traindata --batch-size 4 --visdom-env mytrainsession --visdom-server localhost --lr 1e-3 --val-dir ./traindata --optim Adam --save saved_new_model.ckpt --imgsize 1000x1000 --val-freq 100 --epochs 200

javiribera commented 4 years ago

I cannot reproduce this error yet because I get an out of memory error when running your command, even setting --batch-size 1. This is probably because my GPU only has 12 GB. Your input image size of 1000x1000 yields a CNN of 125 M parameters (this is shown when you run train.py), which seems pretty large.

How slow is validation if you run it with 256x256 so that I can reproduce it?

Acmenwangtuo commented 4 years ago

Yeah,I have met the same question with you,so I resize the image to 256x256,it still very slow when validate,about a few minutes one images with low recall and accuracy

javiribera commented 4 years ago

Please post the full standard output of your training log to https://pastebin.com/ and let us have a look.

javiribera commented 4 years ago

Closing due to inactivity and lack of info. Feel free to reopen if you show us know the training log.