vitalab / castor

CArdiac SegmenTation with cOnstRaints (CASTOR) project
Apache License 2.0
13 stars 1 forks source link

About CAMUS dataset #4

Open dengxl0520 opened 1 year ago

dengxl0520 commented 1 year ago

I want to use the CAMUS dataset on this project, but I have some problems: I use the castor/vital/vital/data/camus/dataset_generator.py to generate the HDF5 file, but I got an error:

(castor) dengxiaolong@rtx2:~/code/castor/vital/vital/data/camus$ python dataset_generator.py /data/dengxiaolong/ Traceback (most recent call last): File "/home/dengxiaolong/code/castor/vital/vital/data/camus/dataset_generator.py", line 353, in main() File "/home/dengxiaolong/code/castor/vital/vital/data/camus/dataset_generator.py", line 339, in main CrossValidationDatasetGenerator()( File "/home/dengxiaolong/code/castor/vital/vital/data/camus/dataset_generator.py", line 106, in call fold_subset_patients = self.get_fold_subset_from_file(data, fold, subset_name_in_data) File "/home/dengxiaolong/code/castor/vital/vital/data/camus/dataset_generator.py", line 136, in get_fold_subset_from_file with open(str(list_fn), "r") as f: FileNotFoundError: [Errno 2] No such file or directory: '/data/dengxiaolong/listSubGroups/subGroup1_training.txt'

how can I get the 'subGroup1_training.txt'?

gungui98 commented 1 year ago

@dengxl0520 do you have any update on this? I was shocked by the amount of over-engineering of their projects.

gungui98 commented 1 year ago

@dengxl0520 I have just read through the dataset_generator.py and it looks like the team could access to a CAMUS dataset with fully annotated sequence from ED to ES, it seems confuse to me because the public training set of CAMUS does not contain any of this information.

https://github.com/vitalab/vital/blob/e5ce208b3263e781e7ed306b9f46b6e84134cc6a/vital/data/camus/dataset_generator.py#L240

dengxl0520 commented 1 year ago

@gungui98 About the dataset, you can see the paper https://arxiv.org/abs/2112.02102 It mentioned a dataset called TED ,which with fully annotated sequence from ED to ES. But I still don't understand how to use this dataset...

gungui98 commented 1 year ago

@dengxl0520 Seem like they extend the original dataset to full cycle by manual annotation. I have email the author for the dataset but haven't receive the response yet.

dengxl0520 commented 1 year ago

@gungui98 you can download it from here (https://humanheart-project.creatis.insa-lyon.fr/ted.html)

gungui98 commented 1 year ago

@dengxl0520 I have successfully processed the dataset and got h5 file, first you have to run script with full cycle option, where the input dataset is from your provided link. python dataset_generator.py --output ~/data/camus.h5 --sequence_type full_cycle ~/data/camus_full_cycle/TED/database/

I have also skip the k-fold part where I simply split the dataset into 80/10/10 for train test val for the function get_fold_subset_from_file from vital/vital/data/camus/dataset_generator.py into

    def get_fold_subset_from_file(
            cls, data: Path, fold: int, subset: Literal["training", "validation", "testing"]
    ) -> List[str]:
        """Reads patient ids for a subset of a cross-validation configuration.

        Args:
            data: Path to the CAMUS root directory, under which the patient directories are stored.
            fold: ID of the test set for the cross-validation configuration.
            subset: Name of the subset for which to fetch patient IDs for the cross-validation configuration.

        Returns:
            IDs of the patients that are included in the subset of the fold.
        """
        # list_fn = data / "listSubGroups" / f"subGroup{fold}_{subset}.txt"
        # # Open text file containing patient ids (one patient id by row)
        # with open(str(list_fn), "r") as f:
        #     patient_ids = [line for line in f.read().splitlines()]
        import glob
        patient_ids = glob.glob(str(data / "*"))
        # patient_ids = sorted(patient_ids)
        train_set = patient_ids[:int(len(patient_ids) * 0.8)]
        test_set = patient_ids[int(len(patient_ids) * 0.8):int(len(patient_ids) * 0.9)]
        val_set = patient_ids[int(len(patient_ids) * 0.9):]
        if subset == "training":
            return train_set
        if subset == "testing":
            return test_set
        return val_set

I will try to implement the correct and fixed k-fold part but this simply made thing run at first. PS: I have also trained a model with this file, but with CRISP project!

dengxl0520 commented 1 year ago

@gungui98 I try to modify the python file dataset_generator.py like you, and i run the script then i meet other problem.

The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/dengxiaolong/code/castor/vital/vital/data/camus/dataset_generator.py", line 363, in main() File "/home/dengxiaolong/code/castor/vital/vital/data/camus/dataset_generator.py", line 349, in main CrossValidationDatasetGenerator()( File "/home/dengxiaolong/code/castor/vital/vital/data/camus/dataset_generator.py", line 118, in call self._write_patient_data(dataset.create_group(patient_id)) File "/home/dengxiaolong/code/castor/vital/vital/data/camus/dataset_generator.py", line 176, in _write_patient_data data_x_proc = resize_image(data_x, self.target_image_size, resample=Resampling.BILINEAR) File "/home/dengxiaolong/miniconda3/envs/castor/lib/python3.10/site-packages/vital/utils/image/transform.py", line 22, in resize_image resized_image = np.array(Image.fromarray(image).resize(size, resample=resample)) File "/home/dengxiaolong/miniconda3/envs/castor/lib/python3.10/site-packages/PIL/Image.py", line 2955, in fromarray raise TypeError("Cannot handle this data type: %s, %s" % typekey) from e TypeError: Cannot handle this data type: (1, 1, 748), |u1

gungui98 commented 1 year ago

@dengxl0520 not really sure about your problem, but this is code that I have used, it could come from reading the image data:

https://gist.github.com/gungui98/364e8f77930880132dee9704aca9a90d