Ask for help about the dataset

mohammadAbbasniya / BreastCancer-Classification

This repository consists of python implementations which used in process of preparing a method for breast cancer detection using Histopathology images.

4 stars 1 forks source link

Ask for help about the dataset #1

Open ZhiWei-Liu123 opened 1 year ago

ZhiWei-Liu123 commented 1 year ago

In your thesis, you say the BreakHis dataset contains two procedures of biopsy that are SOB and CNB. But I found the link given in your paper only contains SOB. So where did you get the complete dataset? I also found this public dataset in Kaggle, but sadly it only contains SOB pictures.

ZhiWei-Liu123 commented 1 year ago

For machine learning, it is important to get the dataset. I am finding this kind of dataset which contains multi-source medical imagines.

mohammadAbbasniya commented 1 year ago

In your thesis, you say the BreakHis dataset contains two procedures of biopsy that are SOB and CNB. But I found the link given in your paper only contains SOB. So where did you get the complete dataset? I also found this public dataset in Kaggle, but sadly it only contains SOB pictures. @liuzhiwei854

The BreakHis dataset which we used, contained 7909 images all taken by SOB. Thanks to your comment, we should correct that sentence in section 3.1 witch says BreakHis dataset contains two procedures of biopsy: Surgical Open Biopsy (SOB) and Core Needle Biopsy (CNB).

But why this occurred? After downloading dataset we have got a directory like this: In file README.txt you can see Core Needle Biopsy (CNB) is determined as a possible value of BIOPSY_PROCEDURE in line 24, but unfortunately I didn't regard to the label saying (For future use) in line 40.

Thanks, Good luck!

ZhiWei-Liu123 commented 1 year ago

Thanks for your reply. Recently, I am doing a research called the integration of multi-source medical data. So I need a dataset which contains imagines from different medical machines, such as X-ray, CT, MRI and so on. But until now, I have not found this dataset. The public medical imagines only use one technology.

ZhiWei-Liu123 commented 1 year ago

Have you seen the dataset that I described above？

mohammadAbbasniya commented 1 year ago

Have you seen the dataset that I described above？

No I haven't seen before and I think finding this special kind of study would be hard. Most of the studies use a unique type of imaging for whole collection. Although I do have not enough knowledge about the differences between various medical imaging techniques, I believe comparing some techniques like X-Ray and SOB would be pointless because obviously SOB images have totally better quality and contain more information than X-Ray image, so there is no need to spend time over comparing machine learning with X-ray and SOB (at least in case of breast cancer detection).

ZhiWei-Liu123 commented 1 year ago

Thanks

ZhiWei-Liu123 commented 1 year ago

I am interested in your idea proposed in the thesis, which is soft voting ensemble method. In your paper, you say every classifier produces a probability for each class, the mean of which is considered as the final probability of that class, so why not give every classifier a weight value according to their accuracy results and F1-score results because different classifier has different performance on the same dataset which can be seen in your results tables.
This is my own idea that may be not right. I have just learnt machine learning for two months, so I have not taken a deep dive into this subject. From your paper, I have learnt a lot. Thank you.

mohammadAbbasniya commented 1 year ago

why not give every classifier a weight value according to their accuracy results

This would be interesting. As a suggestion, you could combine your idea with some nature‐inspired optimization algorithms (e.g. Genetic, ant colony) for finding the best weights for the vote of each classifier.

From your paper, I have learnt a lot. Thank you.

My pleasure. Good luck!