gzuidhof / luna16

LUNA16 Lung Nodule Analysis - NWI-IMC037 Final Project
BSD 2-Clause "Simplified" License
182 stars 73 forks source link

what the `.mlab` file used for? #13

Open shartoo opened 7 years ago

shartoo commented 7 years ago

Hi, i'm new to medical image process and happen to known mevislab.There are two .mlab files in your repository but no related description mentioned,could you please share sth about this? I have opened the file but the result of the process pipeline seemed meaningless as below, is there something i misunderstood ?

autoseg4

gzuidhof commented 7 years ago

Hi shartoo,

The image you are looking at is a segmented lung, not the orginal CT scan volume. It's a method for segmenting lungs, but it's quite slow and not recommended.

gzuidhof commented 7 years ago

It's not used for anything in the LUNA competition by the way, as the organizers already supply segmentations which are mostly fine. It was created for a sub-assignment in the university course.

shartoo commented 7 years ago

Ah,i see,thank you! There are various way to do lung segmentation,have you ever tried to step further after lung segmentation? Like generating nodule candidates by Mevislab network using traditional method such as region growing,connected component computation,or context feature of nodules.

Some paper show great capability of Mevislab and fine candidates instead of whole lung before CNN may compensate for the lack of labeled positive sample of data ,what's your opinion? Here is some basic work i have done

qq 20170329105540

An example paper using Mevislab is below,published in 2014 and reached a sensitivity of 80% at an average of only 1.0 false positive detections per scan.

Automatic detection of subsolid pulmonary nodules in thoracic computed tomography images

gzuidhof commented 7 years ago

That is very neat, most candidate extraction methods trade off precision for recall. Recall is much more important, as the false positives can be filtered very well (see FPRED track in the LUNA16 competition).

I have been experimenting with a 3D CNN approach without any segmentation steps. The network is quite simple in architecture, but I haven't had time to explore it well or tune the hyperparameters. I haven't calculated the amount of parameters, but a save of the network structure + weights is around 440KB. Inference is also fast at a few seconds per scan.

I am using a different annotation set than the one used in the LUNA16 competition, it has 1406 nodules annotated (by at least 2 radiologists), it excludes nodules which are probably benign. I hit around 1322 of these, so a recall of 0.94, with 59804 candidates. Precision would then be (1322/59804=)0.022. This is much better than the systems used for the FPRED candidate set. Although big note should be taken that this was trained on a subset of the same dataset (no proper cross-validation splits), so real world performance is probably worse!

Maybe if I find the time I can do the proper splits and make another submission..

shartoo commented 7 years ago

A solely 3DCNN can achieve a recall of 0.94, that's amazing 👍 ! A 3DCNN example is a fusion of mutli-level framework from Qi Dou whose kernel achieved the highest competition performance metric (CPM) score in the false positive reduction track in LUNA16 challenge.His network framework is rather simple as below:

3dcnn_all

the result

qq 20170330113042

I have implemented this network by tensorflow but get a performance just so so,which may be caused by bad data preprocess or lacke of enough training data.There is an idea using multi-view 2DCNN came from paper Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks and got a nice result,its framework is:

qq 20170330114410

too complex to implement 😸 . Thank you for your sharing and look forward to your better kernel on kaggle!

huythach commented 7 years ago

@gzuidhof could you share the link to download LUNA16 segmentation data (original_lung_masks). I can only download the original lung data and csv files (including: annotations.csv, candidates.csv, sampleSubmission.csv)

gzuidhof commented 7 years ago

@huythach you can find them here (link from the competition website downloads page).

huythach commented 7 years ago

@gzuidhof Thank you for your shared link. Now I can run you program, but another error happens "IOError: [Errno 2] No such file or directory: 'data\1_1_1mm_slices_lung\subset0\1.3.6.1.4.1.14519.5.2.1.6279.6001.108197895896446896160048741492_slice80.pkl.gz'"

Do you know how to fix it? Thanks.

huythach commented 7 years ago

I see the problem that gzip.open does not automatically create a folder to save file, so that error occurs. By writing a simple check and create folder as follows, I can handle the problem. Thanks.

def create_folder_if_not_exist(filePath): if not os.path.exists(os.path.dirname(filePath)): try: os.makedirs(os.path.dirname(filePath)) except OSError as exc: # Guard against race condition if exc.errno != errno.EEXIST: raise

mjiansun commented 7 years ago

@shartoo 3DCNN mentioned above, the title of this paper? Does he have a shared code?

shartoo commented 7 years ago

@mjiansun no,he does't.But the framefork is rather simple,you can build it with tensorflow(or Keras) by few code.

mjiansun commented 7 years ago

@shartoo OK, thank you very much!

mjiansun commented 7 years ago

@shartoo How did you get the training set?How do you preprocess the data?

shartoo commented 7 years ago

@mjiansun As described in the paper,the training data comes from LUNA16 which can be download from their website.A good model largely depend on suitable data preprocessing ,some data enhancement should be done .

mjiansun commented 7 years ago

@shartoo Ok. I will try.

shartoo commented 7 years ago

@mjiansun 你是南信工的?我是南财的。

mjiansun commented 7 years ago

@shartoo 是的。