mims-harvard / SHEPHERD

SHEPHERD: Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases
https://zitniklab.hms.harvard.edu/projects/SHEPHERD
MIT License
47 stars 15 forks source link

KeyError: 'alpha' when executing predict.py #8

Open Vincent-Ustach opened 3 months ago

Vincent-Ustach commented 3 months ago

Hello, I am working with a small data set using the SHEPHERD checkpoint model files to do causal gene discovery. I ran the shortest paths script with no issues, and am at the predict.py step:

python predict.py \ --run_type causal_gene_discovery \ --patient_data my_data \ --edgelist KG_edgelist_mask.txt \ --node_map KG_node_map.txt \ --saved_node_embeddings_path checkpoints/pretrain.ckpt \ --best_ckpt checkpoints/causal_gene_discovery.ckpt

Here are my logs from the run which includes the error:

Global seed set to 33 Predict hparams: {'seed': 33, 'n_gpus': 0, 'num_workers': 4, 'profiler': 'simple', 'pin_memory': False, 'time': False, 'log_gpu_memory': False, 'debug': False, 'augment_genes': True, 'n_sim_genes': 3, 'aug_gene_w': 0.5, 'wandb_save_dir': PosixPath('/mnt/ds-nas/projects/shepherd/data_download/wandb'), 'saved_checkpoint_path': PosixPath('/mnt/ds-nas/projects/shepherd/data_download/checkpoints/pretrain.ckpt'), 'test_n_cand_diseases': -1, 'candidate_disease_type': 'all_kg_nodes', 'only_hard_distractors': False, 'patient_similarity_type': 'gene', 'n_similar_patients': 2, 'model_type': 'aligner', 'loss': 'gene_multisimilarity', 'use_diseases': False, 'add_cand_diseases': False, 'add_similar_patients': False, 'wandb_project_name': 'causal-gene-discovery', 'train_data': PosixPath('/mnt/ds-nas/projects/shepherd/XXXXX/disease_split_train_sim_patients_8.9.21_kg.txt'), 'validation_data': PosixPath('/mnt/ds-nas/projects/shepherd/XXXXX/disease_split_val_sim_patients_8.9.21_kg.txt'), 'test_data': PosixPath('/mnt/ds-nas/projects/shepherd/XXXXX/shep_export0_cases_YYYYY.jsonl'), 'spl': '/mnt/ds-nas/projects/shepherd/genedx_data/patient_set_0_YYYYY_aggmean_spl_matrix.npy', 'spl_index': '/mnt/ds-nas/projects/shepherd/genedx_data/patient_set_0_YYYYY_spl_index_dict.pkl'} Loading SPL... Loaded SPL information Dataset filepath: /mnt/ds-nas/projects/shepherd/genedx_data/shep_export0_YYYYY.jsonl Number of patients: 100 Finished initalizing dataset There are 100 patients in the test dataset batch size: 100 Traceback (most recent call last): File "predict.py", line 174, in <module> predict(args) File "/mnt/fb1/home/vustach/anaconda3/envs/shepherd/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "predict.py", line 116, in predict dataloader = PatientNeighborSampler('predict', all_data.edge_index, all_data.edge_index[:,all_data.test_mask], File "../shepherd/samplers.py", line 296, in __init__ if hparams["alpha"] < 1: self.gp_spl = gp_spl KeyError: 'alpha'

I don't see alpha in the hparams: https://github.com/mims-harvard/SHEPHERD/blob/e61281f0e6ee3617d1a889d90a9908454ba31e6f/shepherd/hparams.py#L208-L239

Please let me know if you can reproduce my error. Is there a route to setting alpha, and if so, what value do you suggest?

JulianJungnitz commented 4 days ago

I had the same issue when running the script. My work around was to add the alpha parameter myself in line 218 of the Shepherd/hparams.py file

There are two more issues i ran into while trying to do the same thing as you:

if save:
            datetime = time.strftime("%Y%m%d-%H%M%S")
            run_folder = Path(project_config.PROJECT_DIR) / 'checkpoints' / 'patient_NCA' / self.hparams.hparams['run_name'] / datetime
            run_folder.mkdir(parents=True, exist_ok=True)
            print('run_folder', run_folder)