Run create.dataset.py to partition the dataset and generate empty files

Eleanorhxd commented 6 months ago

Hello, why is the CSV file for the validation set and test set empty when I run created_dataset.py to partition the training set, validation set, and test set? What is the reason for this.

sujaly commented 5 months ago

Yes, I have meet the same problem. Have you solved it yet

ttanida commented 5 months ago

create_dataset.py is just a simple Python script.

Just put in some breakpoints in the code and run the debugger. I would assume that the file paths may not have been correctly defined in path_datasets_and_weights.py.

Observing the values of the variables during debugging, it should become clear what the cause of the empty files are.

sujaly commented 5 months ago

这是来自QQ邮箱的假期自动回复邮件。你好，我最近正在休假中，无法亲自回复你的邮件。我将在假期结束后，尽快给你回复。

Eleanorhxd commented 5 months ago

Hi ,I have not solved it.

------------------ 原始邮件 ------------------ 发件人: "ttanida/rgrg" @.>; 发送时间: 2024年5月29日(星期三) 晚上10:50 @.>; @.**@.>; 主题: Re: [ttanida/rgrg] Run create.dataset.py to partition the dataset and generate empty files (Issue #28)

Yes, I have meet the same problem. Have you solved it yet

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Cckyrie commented 5 months ago

I have meet the same problem. Have you solved it yet

ttanida commented 4 months ago

I just had some email correspondence with a researcher who had the same problem.

The root cause was that his path_mimic_cxr variable in src/path_datasets_and_weights.py did not point to a directory that contained a (sub-)directory called "files" with the reference reports in it.

As written in src/path_datasets_and_weights.py:

MIMIC-CXR and MIMIC-CXR-JPG dataset paths should both have a (sub-)directory called "files" in their directories.

Note that we only need the report txt files from MIMIC-CXR, which are in the file mimic-cxr-report.zip at
https://physionet.org/content/mimic-cxr/2.0.0/.

So:

MIMIC-CXR-JPG path contains all the images in jpg format
MIMIC-CXR path contains the reference reports in txt file format (contained in mimic-cxr-report.zip, which is only 135.4 MB).

Since his reference reports were not available, the validation and test sets are both empty, since the function get_reference_report has these lines:

    if not os.path.exists(path_to_report):
        shortened_path_to_report = os.path.join(f"p{subject_id[:2]}", f"p{subject_id}", f"s{study_id}.txt")
        missing_reports.append(shortened_path_to_report)
        return -1

If you look in the log file called log_file_dataset_creation.txt that is created when the script has finished, the line num_missing_reports: xxx should consequently display a high number, indicating that the reference reports were missing.

sujaly commented 4 months ago

这是来自QQ邮箱的假期自动回复邮件。你好，我最近正在休假中，无法亲自回复你的邮件。我将在假期结束后，尽快给你回复。

ttanida / rgrg

Run create.dataset.py to partition the dataset and generate empty files #28