juliandewit / kaggle_ndsb2017

Kaggle datascience bowl 2017
MIT License
623 stars 290 forks source link

Training time for step_train_nodule_detector.py #35

Open wojiushishen opened 7 years ago

wojiushishen commented 7 years ago

Dear julian,

I run you code on my gpu(Tesla K10) simulator, but it seems it's very time-consuming. I need over 30h to finish one epoch. How long do you need to finish one epoch? Thanks. 1508254494 1

juliandewit commented 7 years ago

Hello, That is much longer then what could be expected.. What is a K10 simulator ? Is that much slower than a "normal" card ?

sathyapatel commented 6 years ago

Hi Julian

@juliandewit He ran it on Nvidia Tesla K10 GPU's and obviously it takes hours to complete per epoch. @wojiushishen : why don't you try Nvidia GPU cloud High performance computing ? Signup and run it Nvidia GPC has deep learning container and massive performance for large networks

https://www.nvidia.com/en-us/gpu-cloud/

ahasanpour commented 6 years ago

Dear Julian

I have the same issue with step2_train_nodule_detector.py, I have GTX 1080 and each epoch takes 80 hours! I commented below line in the code but not too much difference happens.

config.gpu_options.per_process_gpu_memory_fraction = 0.5

The code does not use all gpu utilization(40%) and seems preparing inputs takes too much time.

Thanks in advance for any solution.

wojiushishen commented 6 years ago

Change model.fit_generator(train_gen, len(train_files) / 1, ... to model.fit_generator(train_gen, len(train_files) //batch_size, ... It's a problem caused by keras update, the second parameter of model.fit_generator changed from length of training data to iteration of every epoch. Hoping that it solved your problem.

juliandewit commented 6 years ago

I was on Keras 1.X. So if someone has a pull request then I can add it.