Discussion: BRAX datasets' image crop

seoulsky-field commented 1 year ago

What

Discuss about BRAX datasets' image crop correctly

Why

When I was doing classification of frontal or lateral view for BRAX dataset, I rarely face the chest X-ray image was too small in the image (Too much padded). Especially, I think the patient age is younger, the probability of appearing too much padded image is more increasing. If you want to see the image sample, please check our project Notion.

How

I think dropping the rows who is young patient is not proper way. How about find four real corner spots about chest X-ray and after that, crop, resize, and padding?

chrstnkgn commented 1 year ago

Why do you think that dropping young patients is improper if the issue usually appears in younger patients? Sometimes we tend to exclude patients who are not adults on purpose as the images of the adults and children are different.

seoulsky-field commented 1 year ago

@chrstnkgn Thank you for your opinion.

스크린샷 2023-02-20 오전 10 17 47

As you see, the number of children (0, 5, 10, 15 years old) patients are over 6,000 and this number is about 15% of total BRAX datasets. Especially the patients that age is 0 are not a minor feature, so I thought excluding children patients is quite cautious.

However, as you said, if the examples of excluding children patients with special purpose, I think it would be a great decision.

kdg1993 commented 1 year ago

It is a nice insight from the detailed exploration of data! An adequate process should be applied to the noisy images but before proceeding, it is wise to check the detail more precisely. In that context, I am a bit curious about the opnion patient age is younger, the probability of appearing too much padded image is more increasing. Is there any evidence to support this? Since it makes me think like the conditional probability of noise monotonically decreases according to the patient's age.

seoulsky-field commented 1 year ago

@kdg1993 Thanks to comment.

Umm, the context should be revised that avoid confusion, "If the patient is children, the probability of appearing too much padded image is higher than the adults." When I checked the images one by one more than 2,000 images, I felt this. If you just checked a hundred images each of 0 year old and 45 years old, you absolutely felt it immediately.

kdg1993 commented 1 year ago

I think that the solution can be summarized in 2 ways, so far

Focus on the adult and discard minor age patients
Do some image preprocessing for noisy samples

I feel free to accept the first option. In my mind, it is in line with removing Lateral image samples.

However, the second option still makes me confusing because I have no clear answer or evidence to the question 'If age group 0 is needed some kind of treatment, then what about the other age groups?'

I still haven't checked the images in every age group one by one but rather, I suggest checking the image per sample to help to consider option number 2. If we suppose only the age group 0 should be treated differently, the age group must have some traits that the others do not have. In that context, I wanted to test the hypothesis 'Infant usually doesn't need medical treatment but if it needs it might be a serious problem. Thus, age group 0 has a distinctively high number of image samples than the other age groups. In other words, the doctors took a lot of x-rays of babies compared to the adults.' I think it is important because of two things

If an age group has a high number of images per patient, it means high bias to the small group of people
This point of view also can be applied to the nearest age group or the opposite extreme side (the oldest group) Below is the bar plot of number of Images per patient (average) of each age group

I think the graph above is not clear for either accepting or rejecting the hypothesis, but slightly close to rejecting the hypothesis. The age group 0 does not show a unique value compared to the others. It is high but not that different from the old age groups. It doesn't support the original hypothesis but still, we can suspect that the oldest age group sample also needs some kind of preprocessing. Overall, it is more clear that the closer the patient's age to the young (I mean young adults 25~35, which is a very healthy age), the image per patient bigger. I think it is very natural because the age 25-35, they usually don't like to go to hospital and are healthy. In conclusion, the age group 0 is not comprised of a small number of patients (only in my eyes, and always other points of view is welcome) but the number of images per patient shows a clear tendency according to the patient's age.

seoulsky-field / CXRAIL-dev

Discussion: BRAX datasets' image crop #99

What

Why

How