marshuang80 / gloria

GLoRIA: A Multimodal Global-Local Representation Learning Framework forLabel-efficient Medical Image Recognition
Apache License 2.0
173 stars 29 forks source link

In the segmentation task, EncodedPixels seems to have an extra space, which I remove, but... #8

Closed church-XP closed 2 years ago

church-XP commented 2 years ago

When I was working on the segmentation task, I ran into a problem: Traceback (most recent call last): File "run.py", line 167, in <module> main(cfg, args) File "run.py", line 106, in main trainer.fit(model, dm) File "/GPUFS/nsccgz_ywang_zfd/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 457, in fit self.accelerator_backend.setup(model) File "/GPUFS/nsccgz_ywang_zfd/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/dp_accelerator.py", line 56, in setup self.setup_optimizers(model) File "/GPUFS/nsccgz_ywang_zfd/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 145, in setup_optimizers optimizers, lr_schedulers, optimizer_frequencies = self.trainer.init_optimizers(model) File "/GPUFS/nsccgz_ywang_zfd/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/optimizers.py", line 31, in init_optimizers optim_conf = model.configure_optimizers() File "/GPUFS/nsccgz_ywang_zfd/zxp/gloria/gloria/lightning/segmentation_model.py", line 37, in configure_optimizers scheduler = builder.build_scheduler(self.cfg, optimizer, self.dm) File "/GPUFS/nsccgz_ywang_zfd/zxp/gloria/gloria/builder.py", line 109, in build_scheduler num_iter = len(dm.train_dataloader().dataset) File "/GPUFS/nsccgz_ywang_zfd/zxp/gloria/gloria/datasets/data_module.py", line 110, in train_dataloader dataset = self.dataset(self.cfg, split="train", transform=transform) File "/GPUFS/nsccgz_ywang_zfd/zxp/gloria/gloria/datasets/image_dataset.py", line 192, in __init__ neg_series_selected = np.random.choice( File "mtrand.pyx", line 908, in numpy.random.mtrand.RandomState.choice ValueError: 'a' cannot be empty unless no samples are taken ValueError: 'a' cannot be empty unless no samples are taken I want to ask what is this first parameter How to deal with

church-XP commented 2 years ago

At the beginning when I was working on the split task It cannot preprocess to generate train.csv I ran the code alone and the generation succeeded as PNEUMOTHORAX_ORIGINAL_TRAIN_CSV using stage_2_sample_submission.csv

But even though I already have the paperwork The error display is KeyError: 'EncodedPixels' I thought it was an extra space, so I took it out But then there were another problems ValueError: 'a' cannot be empty unless no samples are taken I really don't know how to deal with this

church-XP commented 2 years ago

I found out that this is the prediction sample that you need to submit on Kaggle Therefore, it should be spilt data with train.csv But it doesn't work

KeyError: '1.2.276.0.7230010.3.1.4.8323329.3678.1517875178.953520' So I would like to ask how should I correspond to mask and imagEID How do you corresponding ID_9979c1b39 and imageid 1.2.276.0.7230010.3.1.4.8323329.3678.1517875178.953520

marshuang80 commented 2 years ago

Hi there, I am having some difficulties understanding your questions. Can you elaborate on what you meant by "paperwork"? What is the "split task" you were working on? Can you please also provide the detailed error message and the specific command you ran so I can help you debug from my end? It is difficult to help without knowing the scripts you are referring to.

To answer your first question, the "a" here refers to the first parameters for np.random.choice, which is the list that we are randomly sampling. In this case, it is the variable, neg_series, which comes from self.df_neg["ImageId"].unique() from this line. I recommend you double-check if the ImageId column is correctly processed.

For your reference, I have attached PNEUMOTHORAX_TRAIN_CSV below: train.csv

church-XP commented 2 years ago

I'm sorry for the expression,Let me rephrase that

I am working on Your segmentation task. I've updated the data path after constants.py I can't find PNEUMOTHORAX_ORIGINAL_TRAIN_CSV = "train-rle.csv" The only files I downloaded were stage_2_train.csv and stage_2_sample_submission.csv

I thought stage_2_train.csv was your train-rle.csv, but it didn't work It tells me I lack of "train.csv" In your preprocess_datasets.py preprocess_pneumonia_data(test_fac=0.15): doesn't seem to be worked

PNEUMOTHORAX_ORIGINAL_TRAIN_CSV cannot be used to allocate train.csv, vald.csv, and test.csv So I did this separately, but there seems to be a problem with the assignment 465ca5d6231fc9907dfc359b0356fad

`if name == "main": try: df = pd.read_csv(PNEUMOTHORAX_ORIGINAL_TRAIN_CSV) except: raise Exception( "Please make sure the the SIIM Pneumothorax dataset is \ stored at {PNEUMOTHORAX_DATA_DIR}" )

# get image paths
img_paths = {}
for subdir, dirs, files in tqdm.tqdm(os.walk(PNEUMOTHORAX_IMG_DIR)):
    for f in files:
        if "dcm" in f:
            # remove dcm
            file_id = f[:-4]
            img_paths[file_id] = os.path.join(subdir, f)

# no encoded pixels mean healthy
df["Label"] = df.apply(
    lambda x: 0.0 if x["EncodedPixels"] == "-1" else 1.0, axis=1
)
df["Path"] = df["ImageId"].apply(lambda x: img_paths[x])

# split data
train_df, test_val_df = train_test_split(df, test_size=0.15 * 2, random_state=0)
test_df, valid_df = train_test_split(test_val_df, test_size=0.5, random_state=0)

print(f"Number of train samples: {len(train_df)}")
print(train_df["Label"].value_counts())
print(f"Number of valid samples: {len(valid_df)}")
print(valid_df["Label"].value_counts())
print(f"Number of test samples: {len(test_df)}")
print(test_df["Label"].value_counts())

train_df.to_csv(PNEUMOTHORAX_TRAIN_CSV)
valid_df.to_csv(PNEUMOTHORAX_VALID_CSV)
test_df.to_csv(PNEUMOTHORAX_TEST_CSV)`

I use this to generate three CSV It looks like I used stage_2_train.csv incorrectly so it replace to the stage_2_sample_submission.csv and it work! but after all ,something still go wrong

church-XP commented 2 years ago

image this is what i generate train.csv using the code . but it definitely go wrong which all the label are 1.0 anyway, so i restart the command $python run.py -c ./configs/pneumothorax_segmentation_config.yaml --train --test --train_pct 0.01 fe365c5399e68c7ef7289e112763a9e

and it appear this

church-XP commented 2 years ago

I really hope you can solve my problem. Thank you very much

church-XP commented 2 years ago

The code uses what's in your preprocess_datasets.py def preprocess_pneumonia_data(test_fac=0.15) if __name__ == "__main__": try: df = pd.read_csv(PNEUMOTHORAX_ORIGINAL_TRAIN_CSV) except: raise Exception( "Please make sure the the SIIM Pneumothorax dataset is \ stored at {PNEUMOTHORAX_DATA_DIR}" )

The rest is the same as your code

church-XP commented 2 years ago

Oh, I found something wrong with my train.csv I downloaded the dataset Files usually have the form :id_0011FE81E.dcm My image ID is different from yours I tried to use stage_2_train.csv which it look like 6ab84a45a7ab11c7dfccbc862fa7642 but failed

b5733c24ec3820dabe4038c12b69b86

marshuang80 commented 2 years ago

Hi @church-XP, you are seeing the error message "ValueError: 'a' cannot be empty unless no samples are taken" because you have created your train/val/test.csv with stage_2_sample_submission.csv, which only contains positive samples.

Can you please try the following:

  1. Download the images by running python download_images.py, which should be in your SIIM data directory. You can also find the script here (https://www.kaggle.com/competitions/siim-acr-pneumothorax-segmentation/data?select=download_images.py)
  2. The previous step should download the dicom-images-train folder for you. Please set that as your PNEUMOTHORAX_IMG_DIR
  3. Set PNEUMOTHORAX_ORIGINAL_TRAIN_CSV to your train.csv
  4. Rerun preprocess_pneumonia_data

Hope that helps.

church-XP commented 2 years ago

image So these downloads file aren't what you're training with you use the the stage 1 files ,not the The stage 2 files

I should use Python download_images.py to download,right? Emmm i can not connect which it Could not automatically determine credentials. image i follow the step ,but i still have problem image and i google ,it can't fix my problem Is there a second way for me to download it or ...

church-XP commented 2 years ago

I tried it on the server and on my computer terminal I was wondering if there is any other way to get this part of the data set image

Does this mean that Cloud Health is not saved, so I can't access it there

church-XP commented 2 years ago

also,the kaggle not saved(The link point is empty)

church-XP commented 2 years ago

i find the dataset in other way ,hope it can work

marshuang80 commented 2 years ago

Good luck!

zyt0211 commented 1 year ago

i find the dataset in other way ,hope it can work

Hello! How were your questions about the dataset resolved? I'm struggling with this problem.