Issues with genomic_benchmark Experiment code in Caduceus

xwx1999 commented 3 months ago

Hello, I am writing to provide feedback on the code for the genomic_benchmark experiment using the Mamba and Caduceus model. I am pleased to report that the pre-training process went smoothly for me without any issues.

However, I encountered some problems when I attempted to run the genomic_benchmark experiment. The command I used below for reference:

> python -m train \
>     experiment=hg38/genomic_benchmark \
>     callbacks.model_checkpoint_every_n_steps.every_n_train_steps=5000 \
>     dataset.dataset_name="dummy_mouse_enhancers_ensembl" \
>     dataset.train_val_split_seed=1 \
>     dataset.batch_size=128 \
>     dataset.rc_aug=false \
>     +dataset.conjoin_train=false \
>     +dataset.conjoin_test=false \
>     loader.num_workers=2 \
>     model=caduceus \
>     model._name_=dna_embedding_caduceus \
>     +model.config_path="/home/v-weixixiang/caduceus/outputs/pretrain/hg38/caduceus_ph_seqlen-1024_d_model-118_n_layer-4_lr-8e-3/config.json" \
>     +model.conjoin_test=false \
>     +decoder.conjoin_train=true \
>     +decoder.conjoin_test=false \
>     optimizer.lr="1e-3" \
>     trainer.max_epochs=10 \
>     train.pretrained_model_path="/home/v-weixixiang/caduceus/outputs/pretrain/hg38/caduceus_ph_seqlen-1024_d_model-118_n_layer-4_lr-8e-3/checkpoints/last.ckpt" \
>     wandb=null

Unfortunately, I faced some issues, the details of which are captured in the attached screenshot:

> Error executing job with overrides: ['experiment=hg38/genomic_benchmark', 'callbacks.model_checkpoint_every_n_steps.every_n_train_steps=5000', 'dataset.dataset_name=dummy_mouse_enhancers_ensembl', 'dataset.train_val_split_seed=1', 'dataset.batch_size=128', 'dataset.rc_aug=true', '+dataset.conjoin_train=false', '+dataset.conjoin_test=false', 'model=mamba', 'model._name_=dna_embedding_mamba', '+model.config_path=/home/v-weixixiang/caduceus/outputs/pretrain/hg38/mamba_ntp_rc_aug_seqlen-1024_d_model-128_n_layer-4_lr-8e-3/config.json', '+model.conjoin_test=false', '+decoder.conjoin_train=false', '+decoder.conjoin_test=false', 'optimizer.lr=1e-3', 'trainer.max_epochs=10', 'train.pretrained_model_path=/home/v-weixixiang/caduceus/outputs/pretrain/hg38/mamba_ntp_rc_aug_seqlen-1024_d_model-128_n_layer-4_lr-8e-3/checkpoints/last.ckpt', 'wandb.group=downstream/gb_cv5', 'wandb.job_type=dummy_mouse_enhancers_ensembl', 'wandb.name=mamba_uni_lr-1e-3_batch_size-128_rc_aug-true', 'wandb.id=gb_cv5_dummy_mouse_enhancers_ensembl_mamba_uni_lr-1e-3_batch_size-128_rc_aug-true_seed-1', '+wandb.tags=[seed-1]']
> Traceback (most recent call last):
>   File "/home/v-weixixiang/caduceus_new/caduceus/train.py", line 715, in main
>     train(config)
>   File "/home/v-weixixiang/caduceus_new/caduceus/train.py", line 658, in train
>     model = SequenceLightningModule(config)
>   File "/home/v-weixixiang/caduceus_new/caduceus/train.py", line 154, in __init__
>     self.setup()
>   File "/home/v-weixixiang/caduceus_new/caduceus/train.py", line 202, in setup
>     self.model = utils.instantiate(registry.model, model_hparams)
>   File "/home/v-weixixiang/caduceus_new/caduceus/src/utils/config.py", line 109, in instantiate
>     return obj()
> TypeError: __init__() got an unexpected keyword argument 'train'
>

Could you please provide any assistance or guidance on how to resolve these problems? Your help would be greatly appreciated.

xwx1999 commented 3 months ago

Besides, I have now conducted the nucleotide_transformer experiment code using the command you've provided (deleted the repeated trainer.max_epochs parameter), and I encountered the same problems as with the genomic_benchmark experiment. Here is the revised command I used:

> python -m train \
>     experiment=hg38/nucleotide_transformer \
>     callbacks.model_checkpoint_every_n_steps.every_n_train_steps=5000 \
>     dataset.dataset_name="enhancers" \
>     dataset.train_val_split_seed=1 \
>     dataset.batch_size=128\
>     dataset.rc_aug="true" \
>     +dataset.conjoin_test="false" \
>     loader.num_workers=2 \
>     model=caduceus \
>     model._name_=dna_embedding_caduceus \
>     +model.config_path="/home/v-weixixiang/caduceus/outputs/pretrain/hg38/caduceus_ph_seqlen-1024_d_model-256_n_layer-4_lr-8e-3/config.json" \
>     +model.conjoin_test=false \
>     +decoder.conjoin_train=true \
>     +decoder.conjoin_test=false \
>     optimizer.lr="1e-3" \
>     train.pretrained_model_path="/home/v-weixixiang/caduceus/outputs/pretrain/hg38/caduceus_ph_seqlen-1024_d_model-256_n_layer-4_lr-8e-3/checkpoints/last.ckpt" \
>     trainer.max_epochs=20 \
>     wandb=null

yair-schiff commented 3 months ago

Please try changing the file you pass to the model.config_path= parameter to be the model_config.json that gets saved in the same directory, as opposed to config.json. I think this should solve the issue. Let me know if you’re still having trouble.

xwx1999 commented 3 months ago

I am glad to report that the issue has been resolved successfully by following your guidance, thank you very much!

kuleshov-group / caduceus

Issues with genomic_benchmark Experiment code in Caduceus #9