wasserth / TotalSegmentator

Tool for robust segmentation of >100 important anatomical structures in CT and MR images
Apache License 2.0
1.42k stars 239 forks source link

`s0864/ct.nii.gz` cannot be unarchived properly #24

Closed nai62 closed 11 months ago

nai62 commented 1 year ago

TL; DR

After unarchiving Totalsegmentator_dataset.zip, access to the very last element of s0864/ct.nii.gz results in a CPC check error.

Minimum python code to reproduce:

import nibabel as nib
import numpy as np
a = nib.load('s0864/ct.nii.gz')
print(a.dataobj[-1, -1, -1])

Error:

BadGzipFile: CRC check failed 0x5ab25329 != 0x84f031a9

Details

Versions:

Could you confirm if the error occurs in your environment? Thank you in advance!

nai62 commented 1 year ago

Sorry, there's no need to use python to reproduce this error. The gunzip command also yields similar errors.

$ gunzip s0864/ct.nii.gz

gzip: s0864/ct.nii.gz: invalid compressed data--crc error

gzip: s0864/ct.nii.gz: invalid compressed data--length error
wasserth commented 1 year ago

Hi, I also have the same problem. Unfortunately this file somehow got corrupted during the creation of the zip archive. For now the easiest solution is to exclude this file from the dataset.

nai62 commented 1 year ago

Thank you for your confirmation! I tried to visualize it and found that, as you mentioned, this file seems to be corrupted except for the first few slices. For now, I think I'm going to exclude it in my experiments as a workaround. Thank you for your suggestion.

naayem commented 1 year ago

Hello, Is it the only file you found out to be corrupted? I'm running some trainings with nnUnet and until now everything seems fine. I tried to run a training with a model called swin UNETR from their BTCV tutorial. And the training with the swin UNETR code encounters many corrupted files. I dropped this experiment because of that.

nai62 commented 1 year ago

Hi, as far as I confirmed, all the other files could be successfully opened without errors (in the way I wrote in the "TL; DR" above). I haven't performed a training with them, though.

naayem commented 1 year ago

code:

import os
import nibabel as nib

dir_path = "/scratch/izar/naayem/TEST/nnUNet_raw_data_base/nnUNet_raw_data/Task601_Totalsegmentator"
corrupted_files = []
for subdir in os.listdir(dir_path):
    subdir_path = os.path.join(dir_path, subdir)
    if os.path.isdir(subdir_path):
        for file in os.listdir(subdir_path):
            file_path = os.path.join(subdir_path, file)
            try:
                a = nib.load(file_path)
                print(a.dataobj[-1, -1, -1])
                print(f"{file_path} is loaded successfully")
            except Exception as e:
                print(f"Error occurred in file {file_path}: {e}")
                corrupted_files.append(file_path)

if corrupted_files:
    print("Corrupted files:")
    for file in corrupted_files:
        print(file)
else:
    print("No corrupted files found.")

I ran this code and also got: Corrupted files: /scratch/izar/naayem/TEST/nnUNet_raw_data_base/nnUNet_raw_data/Task601_Totalsegmentator/imagesTr/s0864_0000.nii.gz

So I guess it confirms it on my side.

Khoa-NT commented 1 year ago

Hi, I don't know if this is the correct place to report my error or not. I downloaded the dataset and unzip on my Macbook. I dragged and dropped the file (e.g., Totalsegmentator_dataset/s0000/ct.nii.gz) to the 3D slicer v5.2.1, however, I always got the error Error: Loading Totalsegmentator_dataset/s0000/ct.nii.gz - load failed. Did I open the file correctly?

wasserth commented 11 months ago

I uploaded a new version of the dataset. Now unzipping should work without errors.

michaelmyc commented 11 months ago

@wasserth I was using the new V2 version, with md5sum hash of fd65f71cf3ef78c67a3740909ecef674. However, I'm also getting gzip CRC issues while reading with nibabel. I also tested gunzip which doesn't work either.

Files I'm having issues with are: