Closed jaehwana2z closed 6 months ago
Dear @jaehwana2z,
The same issue is discussed on Hugging Face: https://huggingface.co/datasets/ibrahimhamamci/CT-RATE/discussions/53 Please see the discussion thread for more information about this and other ways to download dataset. The problem should now be fixed for specific dataset configurations (labels, reports, or metadata). Please let me know if you still have issue with this!
I get the following error after running the command:
Generating train split: 47149 examples [00:00, 135849.06 examples/s] Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 1989, in _prepare_split_single writer.write_table(table) File "/opt/conda/lib/python3.10/site-packages/datasets/arrow_writer.py", line 584, in write_table pa_table = table_cast(pa_table, self._schema) File "/opt/conda/lib/python3.10/site-packages/datasets/table.py", line 2240, in table_cast return cast_table_to_schema(table, schema) File "/opt/conda/lib/python3.10/site-packages/datasets/table.py", line 2194, in cast_table_to_schema raise CastError( datasets.table.CastError: Couldn't cast VolumeName: string Medical material: int64 Arterial wall calcification: int64 Cardiomegaly: int64 Pericardial effusion: int64 Coronary artery wall calcification: int64 Hiatal hernia: int64 Lymphadenopathy: int64 Emphysema: int64 Atelectasis: int64 Lung nodule: int64 Lung opacity: int64 Pulmonary fibrotic sequela: int64 Pleural effusion: int64 Mosaic attenuation pattern: int64 Peribronchial thickening: int64 Consolidation: int64 Bronchiectasis: int64 Interlobular septal thickening: int64 -- schema metadata -- pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 2787 to {'VolumeName': Value(dtype='string', id=None), 'Manufacturer': Value(dtype='string', id=None), 'SeriesDescription': Value(dtype='string', id=None), 'ManufacturerModelName': Value(dtype='string', id=None), 'PatientSex': Value(dtype='string', id=None), 'PatientAge': Value(dtype='string', id=None), 'ReconstructionDiameter': Value(dtype='float64', id=None), 'DistanceSourceToDetector': Value(dtype='float64', id=None), 'DistanceSourceToPatient': Value(dtype='float64', id=None), 'GantryDetectorTilt': Value(dtype='int64', id=None), 'TableHeight': Value(dtype='float64', id=None), 'RotationDirection': Value(dtype='string', id=None), 'ExposureTime': Value(dtype='float64', id=None), 'XRayTubeCurrent': Value(dtype='int64', id=None), 'Exposure': Value(dtype='int64', id=None), 'FilterType': Value(dtype='string', id=None), 'GeneratorPower': Value(dtype='float64', id=None), 'FocalSpots': Value(dtype='string', id=None), 'ConvolutionKernel': Value(dtype='string', id=None), 'PatientPosition': Value(dtype='string', id=None), 'RevolutionTime': Value(dtype='float64', id=None), 'SingleCollimationWidth': Value(dtype='float64', id=None), 'TotalCollimationWidth': Value(dtype='float64', id=None), 'TableSpeed': Value(dtype='float64', id=None), 'TableFeedPerRotation': Value(dtype='float64', id=None), 'SpiralPitchFactor': Value(dtype='float64', id=None), 'DataCollectionCenterPatient': Value(dtype='string', id=None), 'ReconstructionTargetCenterPatient': Value(dtype='string', id=None), 'ExposureModulationType': Value(dtype='string', id=None), 'CTDIvol': Value(dtype='float64', id=None), 'ImagePositionPatient': Value(dtype='string', id=None), 'ImageOrientationPatient': Value(dtype='string', id=None), 'SliceLocation': Value(dtype='float64', id=None), 'SamplesPerPixel': Value(dtype='int64', id=None), 'PhotometricInterpretation': Value(dtype='string', id=None), 'Rows': Value(dtype='int64', id=None), 'Columns': Value(dtype='int64', id=None), 'XYSpacing': Value(dtype='string', id=None), 'RescaleIntercept': Value(dtype='int64', id=None), 'RescaleSlope': Value(dtype='int64', id=None), 'RescaleType': Value(dtype='string', id=None), 'NumberofSlices': Value(dtype='int64', id=None), 'ZSpacing': Value(dtype='float64', id=None), 'StudyDate': Value(dtype='int64', id=None)} because column names don't match
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "", line 1, in
File "/opt/conda/lib/python3.10/site-packages/datasets/load.py", line 2582, in load_dataset
builder_instance.download_and_prepare(
File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 1005, in download_and_prepare
self._download_and_prepare(
File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 1100, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 1860, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 1991, in _prepare_split_single
raise DatasetGenerationCastError.from_cast_error(
datasets.exceptions.DatasetGenerationCastError: An error occurred while generating the dataset
All the data files must have the same columns, but at some point there are 18 new columns (Pericardial effusion, Coronary artery wall calcification, Mosaic attenuation pattern, Medical material, Lung nodule, Bronchiectasis, Lung opacity, Hiatal hernia, Pleural effusion, Pulmonary fibrotic sequela, Interlobular septal thickening, Atelectasis, Cardiomegaly, Consolidation, Lymphadenopathy, Peribronchial thickening, Emphysema, Arterial wall calcification) and 43 missing columns (DataCollectionCenterPatient, ConvolutionKernel, Rows, CTDIvol, TableHeight, SeriesDescription, RotationDirection, RescaleType, TotalCollimationWidth, Columns, GantryDetectorTilt, TableSpeed, TableFeedPerRotation, SingleCollimationWidth, RevolutionTime, ImageOrientationPatient, ExposureModulationType, SliceLocation, PatientSex, PhotometricInterpretation, NumberofSlices, ManufacturerModelName, DistanceSourceToDetector, XRayTubeCurrent, ReconstructionTargetCenterPatient, DistanceSourceToPatient, RescaleSlope, ZSpacing, SamplesPerPixel, StudyDate, PatientAge, RescaleIntercept, Manufacturer, Exposure, FocalSpots, SpiralPitchFactor, FilterType, ReconstructionDiameter, ExposureTime, GeneratorPower, XYSpacing, ImagePositionPatient, PatientPosition).
This happened while the csv dataset builder was generating data using
hf://datasets/ibrahimhamamci/CT-RATE/dataset/multi_abnormality_labels/train_predicted_labels.csv (at revision 4d92f6d4f805e36e2891359c04302705c314fe43)