juliandewit / kaggle_ndsb2017

Kaggle datascience bowl 2017
MIT License
622 stars 292 forks source link

How to extract overlays from images for the U-net mass detector? #22

Open guyucowboy opened 7 years ago

guyucowboy commented 7 years ago

Hi, julian, How do you extract overlays from images for the U-net mass detector? The overlays in "resources\segmenter_traindata\ *_o.png". How to generate those files? Thank you!

juliandewit commented 7 years ago

Hello, I labeled them by hand. They are in the resources file. You don't need this part of the solution to get a good score. It improves everything just a little.

guyucowboy commented 7 years ago

Thanks for your reply!

guyucowboy commented 7 years ago

@juliandewit
Hi,julian. How do you know or discover the "mass"feature (or other feature ) of nodule detection which improves everything just a little? How to get this conclusion? Thanks!

guyucowboy commented 7 years ago

hi,julian. Another simple question: when you submit the final_submission.csv to kaggle competition, do you keep as it is? Or change the "cancer" value to 1 if it is bigger than 0.5 and vice versa ?
Thanks!

juliandewit commented 7 years ago

Keep it as it is. You can still submit to Kaggle and see the results.

juliandewit commented 7 years ago

I did local cross validation + cross validation against leaderboard. BOTH needed to show improvements. I tried around 50 features. Almost none gave consistent improvements. Mainly due to the outlier-leaderboard but we did not know that at that time.

guyucowboy commented 7 years ago

Hi, julian. Thank you for your reply! I am newcomer to Kaggle. I find the public and private leaderboard, what is the outlier-leaderboard? Thanks again.

guyucowboy commented 7 years ago

Hi, julian. what are the 50 features? Are they the bottleneck features of the CNN network? Or some of the features are irrelevant with CNN network and are just calculated by some formulas ? Do you use some feature selection algorithms ? Maybe the combination of some features indicated by feature selection algorithm would improve the result even if each feature make a little effect. But I am not sure about that. Thanks.