ieee8023 / covid-chestxray-dataset

We are building an open database of COVID-19 cases with chest X-ray or CT images.
2.99k stars 1.28k forks source link

Why used jpeg instead of DICOM? #48

Open Chuvi-w opened 4 years ago

Chuvi-w commented 4 years ago

Normally digital x-ray image has DICOM format. And it has more bit depth, then jpeg.

ieee8023 commented 4 years ago
  1. The files are not available to us as DICOMs (we are scraping publications because no one will share data)

  2. The DICOM format is not as easy to load in Python using the standard image loading tools so a lossless format like PNG would be better.

  3. What is the increase in bit depth with DICOM?

Chuvi-w commented 4 years ago

As far as I can understand, Dicom data contains Hounsfield scale or can be converted into it. https://gist.github.com/somada141/df9af37e567ba566902e

https://en.wikipedia.org/wiki/Hounsfield_scale

jscheithe commented 4 years ago

The project I'm working on at the moment will be deployed at hospitals, so I can only really use DICOMs for development. DICOM is what you will get forwarded and need to process in practice.

That said, thank you so much for adding the first volumetric data! @ieee8023 how did you get them out of radiopaedia? Any chance to get DICOMs instead of NIfTI?

Let me know if I can help. Thanks!

RichardKCollins commented 4 years ago

The images must be in a lossless format. Most jpeg throws away the detailed pixel level data where the information needed for precise discrimination is located. PNG is apparently lossless, and Dicom is lossless, which is why they use it.

If you anonymize the images and preserve privacy, you will get more cooperation.

I came here from https://www.researchgate.net/post/updated_list_of_Coronavirus_Covid-19_dataset_and_other_Research_Resources3

phildespres commented 4 years ago

Agree with advocates of DICOM format: please rely on establised standards! As for "DICOM format is not as easy to load in Python": https://pydicom.github.io/

ieee8023 commented 4 years ago

If the medical community would make DICOM images public then we could have this debate. But as it stands now that is not what is being made public. This dataset is based on scraping images from publications in radiology journals which publish images in JPEG and PNG because that was the only way to get access to images. We are just being pragmatic here.

RichardKCollins commented 4 years ago

Do you know anything about the ages and sex of the people? Women respond differently than men, age is very important for morbidity and mortality, susceptibility and recovery.

What about the stage of the disease in these people?

Outcomes? Died? went home? In a coma for two weeks? Had pneumonia? Smoker? Anything?

What about asking countries outside the US for samples? Are you guys good at this? Know radiology or progress of the disease? Willing to learn super fast? That sort of thing.

There is a European group starting up. Are you guys good enough with Python or other tools to start regular tracking of all the web to find images, but more importantly who are the key people and groups and papers and tools and supporting measurements that are needed to verify a diagnosis?

Are there any refugee camps, prisons, slums, countries with many homeless and poor, where aven an amateur would be helpful?

Think hard and fast about what you can do. It might now be as glamorous as making false color maps of images, but the world could really use an index of what is going on.

There are 517,000 places on the web that have "dicom" "covid" on them. Who are all those people? What are they doing? What best practices are they recommending? And where are the detail?

You, as a group, that you can expand, should be able to handle a measly half million document references.

I do not know your ages and background, and don't really have time to find out. But if you are trying to use something like this as a stepping stone to a job or career, you will have to learn all the people in whatever industry you want to get into. What better way than to help everyone in the world to see the whole of just "dicom" "covid" completely. If you create a sharing site that show sgroups in the best light and shows their capabilities. You are more likely to be accepted and have them share images later.

At least put in some effort. These people are already trying to do that https://www.itnonline.com/content/rsna-announces-covid-19-imaging-data-repository

Go help them. Or set up something to complement what they are doing. Something.

Richard Collins, Director, The Internet Foundation

ieee8023 commented 4 years ago

At least put in some effort.

Thanks for the advice! Please take a look though at the metadata file to find the outcome information you are looking for.

RichardKCollins commented 4 years ago

Thanks for telling me. I just made an almost two hour video talking to the Wolfram (Mathematica) group about Covid. I mentioned there is a group on GitHub working on medical images. I did not commit you to anything, but I asked them to get out and help find things. Visit there and join just to hear things. Tell them you are working on images and what you need or want.

I probably will not get to do much myself for a while, except the global status and what groups can do. But I will remember it and know to tell people about case data.

Here are a few public videos. It will take an hour to upload this last one. Like I said, almost two hours and I was talking fairly fast.

https://www.youtube.com/channel/UCaJ3voMzj3wP30oY_VTC9Yw

Richard Collins, The Internet Foundation

jscheithe commented 4 years ago

@RichardKCollins I realize that you're just trying to help, but please keep in mind that this repository is taken care of by people in their free time. The way you talk about the great work that is being done here can easily sound disrespectful to the creators. You are not the supervisor here.

RichardKCollins commented 4 years ago

I was just trying to share what I have been gathering as "good ideas" and "best practices" and some "I wonder if this will work". I apologize if I am too abrupt. I appreciate your candor, and I will tone it down. I would rather work on images, but I spent the night doing more statistical reports and curve fitting. I am trying to find a balance. Thanks. Keep up the great work!!!