mahmoodlab / CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
http://clam.mahmoodlab.org
GNU General Public License v3.0
975 stars 329 forks source link

Extracting features problem with h5 files #227

Closed BioCoderR closed 5 months ago

BioCoderR commented 5 months ago

I have been working with the TCGA slides for my project. As the first process, I have created patches using create_patches_fp.py script for the custom threshold parameters. The problem is that it raising an error with h5 files. This is a problem when I'm trying to use extract_features_fp.py for creating the features files for the slides is throwing an error:

Traceback (most recent call last):
  File "/mnt/scripts/extract_features_fp.py", line 197, in <module>
    output_file_path = compute_w_loader(h5_file_path, output_pt_path, wsi, 
  File "/mnt/scripts/extract_features_fp.py", line 43, in compute_w_loader
    dataset = Whole_Slide_Bag_FP(file_path=file_path, wsi=wsi, pretrained=pretrained, 
  File "/mnt/scripts/datasets/dataset_h5.py", line 156, in __init__
    with h5py.File(self.file_path, "r") as f:
  File "/opt/miniconda/envs/clam/lib/python3.8/site-packages/h5py/_hl/files.py", line 562, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
  File "/opt/miniconda/envs/clam/lib/python3.8/site-packages/h5py/_hl/files.py", line 235, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 102, in h5py.h5f.open
OSError: Unable to synchronously open file (file signature not found)
srun: error: g1-4: task 0: Exited with exit code 1

I have also checked the h5 files in the whole path and the output is in this file h5filedata.csv I am also attaching the script I have used for the creation of patches with the preset. Can you please help me solve this problem. Preset file used: tcga_sthresh.csv Output generated to check the h5 files in patches directory: hdf5_attributes_info.csv Command used to create patches is:

# Define parameters
LEVEL=1
SIZE=256
# Sub-directory to the patch coordinates 
SUBDIR_READ=tiles-l${LEVEL}-s${SIZE}-RN50-Sthreshold
# running the create patches on the list of images
python /mnt/scripts/create_patches_fp.py \
    --source /mnt/data/ \
    --save_dir /mnt/results/${SUBDIR_READ} \
    --patch_size ${SIZE} \
    --preset tcga_sthresh.csv \
    --patch_level ${LEVEL} \
    --seg --patch --stitch \
    --no_auto_skip
BioCoderR commented 5 months ago

I solved the issue. The problem is that I use container system to run the CLAM, the problem was due to container doesn't have permissions to write the directories. So I created a directory name pt_files in my data directory where patches are stored and re-run the script then this solved the problem.