wasserth / TotalSegmentator

Tool for robust segmentation of >100 important anatomical structures in CT and MR images
Apache License 2.0
1.51k stars 252 forks source link

Do you plan to enlarge the dataset in the future? #25

Closed Fivethousand5k closed 1 year ago

Fivethousand5k commented 2 years ago

Hi! May I ask whether you plan to enlarge the dataset in the future?

wasserth commented 2 years ago

Hi, at the moment we are working on adding roughly 100-200 more subjects. Those are subjects where the current system is having some problems and more whole body CTs since those are rather rare. If we add some more classes and we fell that more images of a certain type are need we might also add those. But it will always be a minor amount of data. Extending the dataset by orders of magnitude is not planned at the moment. Going to several thousand subjects would increase the effort for correcting the masks too much.

Skylixia commented 1 year ago

@wasserth Hi! I had some questions on this. I read your paper and tested the model and it looks very promising so I'm looking into it more. The paper does not give a lot of details on the patients characteristics from the dataset. I'm wondering if you have description of what kind of images are included eg: trauma patients (not laying in typical orientation for instance with crossed arms, presence of image artifacts etc), patients with medical devices implants/screws/plates/etc, ... ie anything that makes an image peculiar. It says that images were randomly taken across 10 years to have a fair representation of the clinic but then it also says that 40 patients were excluded because they were too difficult to segment (due to pathologies etc). Is there more details on what kind of images were considered too difficult ? Here you mention adding more subjects where the current system has problems with, what kind of images are they ? Are they images with pathological findings ? Thanks in advance !

wasserth commented 1 year ago

We are preparing a more complete paper which will answer some of these questions. Here I can show you a figure with some more details. The dataset contains images with implants/screws, trauma, ... but I can not tell you how many exactly of each of these conditions since we did not count all of them specifically. If you want to know exactly you can download the dataset and look through the images yourself.
Images considered too difficult were images where the radiologist was not able to properly segment the structure, e.g. he was not able to find the border between the lung lobes since they were not really visible anymore due to pathology.

dataset_distributions