juliandewit / kaggle_ndsb2017

Kaggle datascience bowl 2017
MIT License
622 stars 292 forks source link

run code on multiple GPUs #9

Open shu-hai opened 7 years ago

shu-hai commented 7 years ago

Hi, Julian, I just start to run your step3_predict_nodules.py using your trained model. I found it only ran on 1 GPU even I assigned 2 GPUs to it by os.environ["CUDA_VISIBLE_DEVICES"] = "0,1" I also muted config.gpu_options.per_process_gpu_memory_fraction = 0.5 because I am allowed to use the 2 GPUs totally, but the speed was still slow.

Could you let me know how to run the code on multiple GPUs? Thanks.

juliandewit commented 7 years ago

Hello, I think tensorflow sees no way to distribute the network over multiple GPU's. Although in theory it should be smart enough to split the batch in 2 parts en run eacht part on a separate GPU.

You could do this manually however. I cannot type it out for you but every patient needs roughly 30x30x30 (~900) predictions . If you predict half of them over GPU1 with a network and the other half over GPU2 with another instance of the network you will achieve 2x speedup.

shu-hai commented 7 years ago

Hi, Julian, On lines 348-349 of step4_train_submissions.py, it is the following:

  if level == 1:
        dst_dir += "level2/"

Why not level1?

juliandewit commented 7 years ago

Indeed I also had to look twice after this time.

The level 1 models are combined into level2 folder. The models in level2 are combined into the submission folder.

shu-hai commented 7 years ago

Also it gives an error on line 23 of step4_train_submissions.py: mass_df = pandas.read_csv(settings.BASE_DIR + "masses_predictions.csv"). It cannot find the masses_predictions.csv file. I searched this file name in the codes of first three steps,but cannot find it. Where do you generate this file?

juliandewit commented 7 years ago

step2_train_mass_segmenter.py Also has a predict phase. This one will generate this file.

You can also leave it out. It will not change the score very much.