juliandewit / kaggle_ndsb2017

Kaggle datascience bowl 2017
MIT License
624 stars 290 forks source link

Where does the data in resource.rar come from? #7

Open tjliupeng opened 7 years ago

tjliupeng commented 7 years ago

Hi, julian,

Your work is great. Thanks for sharing.

I download the resource.rar and there are several folders including different data. As far as I know, the data in the folder 'luna16_annotations' is from LUNA16 and LIDC-IDRI. How about other folders?

Thanks Liu Peng

juliandewit commented 7 years ago

Hello Liu, There automatically generated labels (edges, false positives) There are manual nodule labels for LUNA / ndsb phase 1 trainset. There is also images + overlays for the U-net mass detector. The images come mainly from LUNA + NDSB phase 1 trainset but also some other sources are used (based on the competions external data thread). I manually made overlays for the mass detector.

More to read about it here: http://juliandewit.github.io/kaggle-ndsb2017/

tjliupeng commented 7 years ago

Hi, Julian,

I am curious about the manual labels in folder "luna16_manual_labels" and "ndsb3_manual_labels". According to the word "manual", it should be done by yourself. Right? How do you achieve that? By any python program or any DICOM applications?

And you said "I manually made overlays for the mass detector", how?

I read the blog already before I submit this issue, still can't understand clearly. Thanks

juliandewit commented 7 years ago

I did it myself with the help of the Kaggle trainings labels (bit still the results were worse than luna only.. I'm a bad radiologist :) ) I used a very crude custom made viewer / labeler. The screensshots in the blog are from that tool.

I'm not publishing the viewer since it had no teaching value and is horrible code + all kinds of assumptions about the system it's running on.

tjliupeng commented 7 years ago

As you said "still the results were worse than luna only", why do you still use the manual labels to train at stage 1? How about the result without the manual labels?

juliandewit commented 7 years ago

1st model was luna only. 2nd model was with manual labels and bit worse The combination was better than the two models separate.

tjliupeng commented 7 years ago

In the function data_generator() in step2_train_nodule_detector.py, you do some similar processing for positive and negative samples:

            if wiggle > 0:
                indent_x = random.randint(0, wiggle)
                indent_y = random.randint(0, wiggle)
                indent_z = random.randint(0, wiggle)
            cube_image = cube_image[indent_z:indent_z + CROP_SIZE, indent_y:indent_y + CROP_SIZE, indent_x:indent_x + CROP_SIZE]

            if train_set:
                if random.randint(0, 100) > 50:
                    cube_image = numpy.fliplr(cube_image)
                if random.randint(0, 100) > 50:
                    cube_image = numpy.flipud(cube_image)
                if random.randint(0, 100) > 50:
                    cube_image = cube_image[:, :, ::-1]
                if random.randint(0, 100) > 50:
                    cube_image = cube_image[:, ::-1, :]

What's the purpose?

Thanks

tjliupeng commented 7 years ago

In the function data_generator() in step2_train_nodule_detector.py, you do some similar processing for positive and negative samples:

            if wiggle > 0:
                indent_x = random.randint(0, wiggle)
                indent_y = random.randint(0, wiggle)
                indent_z = random.randint(0, wiggle)
            cube_image = cube_image[indent_z:indent_z + CROP_SIZE, indent_y:indent_y + CROP_SIZE, indent_x:indent_x + CROP_SIZE]

            if train_set:
                if random.randint(0, 100) > 50:
                    cube_image = numpy.fliplr(cube_image)
                if random.randint(0, 100) > 50:
                    cube_image = numpy.flipud(cube_image)
                if random.randint(0, 100) > 50:
                    cube_image = cube_image[:, :, ::-1]
                if random.randint(0, 100) > 50:
                    cube_image = cube_image[:, ::-1, :]

What's the purpose?

And in prepare_image_for_net3D(), image array minus the settings.MEAN_PIXEL_VALUE_NODULE, why does the MEAN_PIXEL_VALUE_NODULE is 41? Why need to minus it?

Thanks

juliandewit commented 7 years ago

MEAN PIXEL.. To get the average input to have a mean more like '0' which gives a more stable network calculation. Note that I did not adjust anymore after the net converged and was stable.

Augmentations. Some overlap yes. Thats just cleanup I needed to do but did not do for the sake of reproducibility. I used to only do augmentations for positive examples but later also started to augment negative examples.

Note that this code is written under a lot of stress and time pressure. I could clean up further.. but that might mean unreproducibility.

guyucowboy commented 7 years ago

Hi, julian, How do you extract overlays from images for the U-net mass detector? "resources\segmenter_traindata\ *_o.png". how to generate those files? Thank you!