marrlab / DomainLab

modular domain generalization: https://pypi.org/project/domainlab/
https://marrlab.github.io/DomainLab/
MIT License
42 stars 2 forks source link

subfolder read error #808

Closed smilesun closed 5 months ago

smilesun commented 5 months ago

on the helmholtz HPC cluster, running the new blood cell benchmark with

run_benchmark_slurm.sh examples/benchmark/benchmark_blood3_resnet.yaml

(the data needed for reproducing in on the cluster and available to all users)

result in zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-126-list of subfolders ['neutrophil_segmented', 'lymphocyte_neoplastic', 'myeloblast', 'promyelocyte_atypical', 'monocyte', 'lymphocyte', 'normoblast', 'plasma_cell', 'neutrophil_band', 'basophil', 'promyelocyte', 'lymphocyte_reactive', 'eosinophil', 'myelocyte', 'metamyelocyte', 'smudge_cell', 'lymphocyte_large_granular', 'hairy_cell'] zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-127-user provided class names: ['basophil', 'erythroblast', 'metamyelocyte', 'myeloblast', 'neutrophil_band', 'promyelocyte', 'eosinophil', 'lymphocyte_typical', 'monocyte', 'myelocyte', 'neutrophil_segmented'] zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-128-subfolder names from folder: /lustre/groups/labs/marr/qscd01/datasets/240416_MLL23 ['neutrophil_segmented', 'myeloblast', 'monocyte', 'neutrophil_band', 'basophil', 'promyelocyte', 'eosinophil', 'myelocyte', 'metamyelocyte']

The above error message is from https://github.com/marrlab/DomainLab/blob/d9c8a623131e3c68871a36883b3294c93c2f5a45/domainlab/dsets/dset_subfolder.py#L151 where <= is set inclusion for Python.

The problem is simply listing two sets (in the form of lists) here makes it difficult for the user to find out which class names are wrongly specified, in addition to https://github.com/marrlab/DomainLab/blob/d9c8a623131e3c68871a36883b3294c93c2f5a45/domainlab/dsets/dset_subfolder.py#L155 we need to print out which name from the user input class names is not from the list of subfolders.
This can probably be done with 1 or 2 extra lines of code after https://github.com/marrlab/DomainLab/blob/d9c8a623131e3c68871a36883b3294c93c2f5a45/domainlab/dsets/dset_subfolder.py#L153

detailed error message:

zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-122-  warnings.warn(warning.format(ret))
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-123-reading domain: matek
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-124-list of subfolders ['neutrophil_segmented', 'erythroblast', 'myeloblast', 'lymphocyte_typical', 'monocyte', 'neutrophil_band', 'basophil', 'promyelocyte', 'eosinophil', 'myelocyte', 'metamyelocyte']
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-125-reading domain: mll
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-126-list of subfolders ['neutrophil_segmented', 'lymphocyte_neoplastic', 'myeloblast', 'promyelocyte_atypical', 'monocyte', 'lymphocyte', 'normoblast', 'plasma_cell', 'neutrophil_band', 'basophil', 'promyelocyte', 'lymphocyte_reactive', 'eosinophil', 'myelocyte', 'metamyelocyte', 'smudge_cell', 'lymphocyte_large_granular', 'hairy_cell']
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-127-user provided class names: ['basophil', 'erythroblast', 'metamyelocyte', 'myeloblast', 'neutrophil_band', 'promyelocyte', 'eosinophil', 'lymphocyte_typical', 'monocyte', 'myelocyte', 'neutrophil_segmented']
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-128-subfolder names from folder: /lustre/groups/labs/marr/qscd01/datasets/240416_MLL23 ['neutrophil_segmented', 'myeloblast', 'monocyte', 'neutrophil_band', 'basophil', 'promyelocyte', 'eosinophil', 'myelocyte', 'metamyelocyte']
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-129-[Fri Apr 19 15:06:12 2024]
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-130-Error in rule run_experiment:
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-131-    jobid: 0
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-132-    input: zoutput/benchmarks/blood/hyperparameters.csv
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-133-    output: zoutput/benchmarks/blood/rule_results/8.csv
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-134-
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-135-RuleException:
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-136-RuntimeError in file /home/aih/xudong.sun/domainlab_blood3/domainlab/exp_protocol/benchmark.smk, line 154:
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-137-user provided class names does not match the subfolder names
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-138-  File "/home/aih/xudong.sun/domainlab_blood3/domainlab/exp_protocol/benchmark.smk", line 154, in __rule_run_experiment
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-139-  File "/home/aih/xudong.sun/domainlab_blood3/domainlab/exp_protocol/run_experiment.py", line 168, in run_experiment
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-140-  File "/home/aih/xudong.sun/domainlab_blood3/domainlab/exp/exp_main.py", line 39, in __init__
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-141-  File "/home/aih/xudong.sun/domainlab_blood3/domainlab/algos/builder_hduva.py", line 49, in init_business
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-142-  File "/home/aih/xudong.sun/domainlab_blood3/domainlab/algos/trainers/a_trainer.py", line 132, in init_business
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-143-  File "/home/aih/xudong.sun/domainlab_blood3/domainlab/tasks/b_task_classif.py", line 19, in init_business
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-144-  File "/home/aih/xudong.sun/domainlab_blood3/domainlab/tasks/b_task.py", line 37, in init_business
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-145-  File "/home/aih/xudong.sun/domainlab_blood3/domainlab/tasks/task_folder.py", line 94, in get_dset_by_domain
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-146-  File "/home/aih/xudong.sun/domainlab_blood3/domainlab/dsets/dset_subfolder.py", line 99, in __init__
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-147-  File "/home/aih/xudong.sun/domainlab_blood3/domainlab/dsets/dset_subfolder.py", line 155, in _find_classes
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-148-  File "/home/aih/xudong.sun/anaconda3/envs/domainlab_py39/lib/python3.9/concurrent/futures/thread.py", line 58, in run
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-149-Shutting down, this might take some time.
zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err:150:Exiting because a job execution failed. Look above for error message
smilesun commented 5 months ago

One thing weird is in the error message

zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-128, the listed subfolders

['neutrophil_segmented', 'myeloblast', 'monocyte', 'neutrophil_band', 'basophil', 'promyelocyte', 'eosinophil', 'myelocyte', 'metamyelocyte']

are less than the error message

zoutput/slurm_logs/run_experiment/run_experiment-index=8-20094266.err-126

['neutrophil_segmented', 'lymphocyte_neoplastic', 'myeloblast', 'promyelocyte_atypical', 'monocyte', 'lymphocyte', 'normoblast', 'plasma_cell', 'neutrophil_band', 'basophil', 'promyelocyte', 'lymphocyte_reactive', 'eosinophil', 'myelocyte', 'metamyelocyte', 'smudge_cell', 'lymphocyte_large_granular', 'hairy_cell']

MatteoWohlrapp commented 5 months ago

I linked a PR that prints the specified classes by the user that don't correspond to subfolders. Regarding the error messages. Is it possible that those printouts are from different domains? This is the full print from my console: reading domain: matek list of subfolders ['neutrophil_segmented', 'erythroblast', 'myeloblast', 'lymphocyte_typical', 'monocyte', 'neutrophil_band', 'basophil', 'promyelocyte', 'eosinophil', 'myelocyte', 'metamyelocyte'] reading domain: mll list of subfolders ['neutrophil_segmented', 'lymphocyte_neoplastic', 'myeloblast', 'promyelocyte_atypical', 'monocyte', 'lymphocyte', 'normoblast', 'plasma_cell', 'neutrophil_band', 'basophil', 'promyelocyte', 'lymphocyte_reactive', 'eosinophil', 'myelocyte', 'metamyelocyte', 'smudge_cell', 'lymphocyte_large_granular', 'hairy_cell'].

It appears that the first print in your comment refers to the matek domain, and the second to the mll domain.

smilesun commented 5 months ago

@MatteoWohlrapp , it could be i made a wrong copy paste, could you find a solution to resolve any error when running run_benchmark_slurm.sh examples/benchmark/benchmark_blood3_resnet.yaml ?

smilesun commented 5 months ago

https://github.com/marrlab/DomainLab/blob/56cf5a2fd015e4222f93ced91e7f1bed4d675bef/domainlab/tasks/task_folder.py#L85

smilesun commented 5 months ago

dict_domain_folder_name2class

the key should be folder name

https://github.com/marrlab/DomainLab/blob/56cf5a2fd015e4222f93ced91e7f1bed4d675bef/examples/tasks/task_blood3.py#L78C30-L78C40

smilesun commented 5 months ago

indeed the key value order is wrong: Merge pull request https://github.com/marrlab/DomainLab/pull/816 from marrlab/blood3 332b9f9 ·