juliandewit / kaggle_ndsb2017

Kaggle datascience bowl 2017
MIT License
622 stars 290 forks source link

Question about skipping the cube in step3_predict_nodules.py? #17

Open tjliupeng opened 7 years ago

tjliupeng commented 7 years ago

Hi, Julian,

In the function predict_cubes() of step3_predict_nodules.py, you try to predict all the 323232 cubes of each patient. You have a cube skipping condition at line 266:

 if cube_mask.sum() < 2000:
         skipped_count += 1

Would you like to explain why?

Thanks tjliupeng

juliandewit commented 7 years ago

This is a speedup trick. If the convnet is looking far outside the rough lung mask, it knows that it does not need to predict anything. (ie. you will not have lung cancer in your stomach).

The speedup is significant. In a volumne of nxnxn it will get down to roughly n/2 x n/2 x n/2

guyucowboy commented 7 years ago

Hi, julian. I think you mean that some cubes are not need to be predicted when cube mask moves far away from lungs . After doing that, I have an another question, why do you use p[0], rather than p[1] ? In code p = model.predict(batch_data, batch_size=batch_size) for i in range(len(p[0])): I find both p[0] and p[1] in p list. Thanks!

guyucowboy commented 7 years ago

Does p[1] contain the mask of the nodule?

tjliupeng commented 7 years ago

@guyucowboy , my understanding is the predict has 2 targets: {"out_class", "out_malignancy"}. So p[0] and p[1] is for these 2 targets.

guyucowboy commented 7 years ago

@tjliupeng, I see the function get_net in step2_train_nodule_detector.py. I think you are right. The code "diameter_mm = round(p[1][i][0], 4)" in step3_predict_nodules.py. Do you know why p[1] is used to calculate diameter? thank you!

juliandewit commented 7 years ago

diameter is a wrong variable name. At first I used diameter as a malignancy inidicator. Then I found the malignancy annotations and started to use them..

It was all very thight and stressfull to try to win next to work, familily etc :) Variable names suffered from it.

tjliupeng commented 7 years ago

@juliandewit , Although you said the name "diameter" is wrong, you still have this line below diameter_perc = round(diameter_mm / patient_img.shape[2], 4) And diameter_perc is stored to the detect result csv file.

juliandewit commented 7 years ago

That is an old discarded feature based on a relative diameter.

tjliupeng commented 7 years ago

@juliandewit , Thanks for clarification.

By the way, after the step3_predict_nodule.py, how do you verify the predict result? I used the csv and try to locate the nodule using the ITK-Snap (a DICOM viewer) according to the x, y, z coordinate in the csv, but failed.

juliandewit commented 7 years ago

I had my internal viewer as discussed in the blog post.

guyucowboy commented 7 years ago

@tjliupeng do you try the upper left corner or lower left corner as the axis Coordinate origin?

guyucowboy commented 7 years ago

and image spacing, etc.

tjliupeng commented 7 years ago

@guyucowboy The predict result is the suspected nodule center position in percentage. I just multiply the center position with the DICOM original shape [pixel width, pixel height, slice number], then I use DICOM viewer to check the nodule with this multiplication result. The image spacing can be used to calculate the cube range on the viewer.

guyucowboy commented 7 years ago

@tjliupeng OK. Thank you for your reply!

guyucowboy commented 7 years ago

Hi, julian. How do you know or discover or decide to discard the "diameter_perc "feature (or other feature ) of nodule detection ? Do you use some feature selection algorithm?

tjliupeng commented 7 years ago

@juliandewit , just back to the question of this post, I still don't understand why the cube is skipped when cube_mask.sum() is less than 2000? Why is 2000 but not other number?

MoonBunnyZZZ commented 7 years ago

@tjliupeng @guyucowboy In the step3.py, the line 284 code,if len(batch_list) % batch_size == 0(batch_size is 128),seems to take 128 cube into the model onetime.Why?I think the input is one single cube one time.If the length of batch_list,which contain useful image info, is less than batch_size,the predict process will ingore these info.Anything wrong in my opinion?

MoonBunnyZZZ commented 7 years ago

@tjliupeng After getting the center position,how to get the cube range?I don't know what value should be used to multiplicate with the DICOM image spacing to get the whole nodule range.

tjliupeng commented 7 years ago

This solution does not predict the X Y Z range for the predict. You can assume the range is 323232.