Closed yongrenr closed 3 months ago
Regarding Q1, this is an error I haven't hit before. Can you provide a bit more of the console output. Also it looks like these two fields are empty in the command you used to launch. They need to be filled with arguments that correspond to a pre-trained model.
+model.config_path=""
train.pretrained_model_path="<path to .ckpt file>" `
Regarding Q2, can you post the LR and training loss graphs from wandb? Did the model ever hit a nan
loss during training?
Q1:Sorry, it's my fault. The code I uploaded has issues, and here are more error screenshots.
RUN:
python -m train \
experiment=hg38/genomic_benchmark \
callbacks.model_checkpoint_every_n_steps.every_n_train_steps=5000 \
dataset.dataset_name="human_nontata_promoters" \
dataset.train_val_split_seed=2 \
dataset.batch_size=128 \
dataset.rc_aug=false \
+dataset.conjoin_train=false \
+dataset.conjoin_test=false \
loader.num_workers=2 \
model=caduceus \
model.name=dna_embedding_caduceus \
+model.config_path="/home/gyc/caduceus-main/outputs/2024-03-11/20-21-19-995417/model_config.json" \
+model.conjoin_test=false \
+decoder.conjoin_train=true \
+decoder.conjoin_test=false \
optimizer.lr="1e-3" \
trainer.max_epochs=10 \
train.pretrained_model_path="/home/gyc/caduceus-main/outputs/2024-03-11/20-21-19-995417/checkpoints/last.ckpt" \
wandb=null
ERROR:
I just tried running this and did not hit the division by zero error. Can you confirm that data was properly downloaded to ./data/genomic_benchmark/human_nontata_promoters/
by the genomics-benchmark
library:
This directory should look like this
data/genomic_benchmark/human_nontata_promoters/
├── test
│ ├── negative
│ └── positive
└── train
├── negative
└── positive
these directories should contain .txt
files with sequences.
Thanks for the reminder, I've successfully run your code and it works great!
Glad to hear it!
Hello, I'm very interested in your model, about the genome benchmark, I operated through your guidance, and there were 2 problems - the dataloader length is 0 and the loss is infinite, I don't know if this is normal, can you help confirm what the reason is? Q1: RUN: python -m train \ experiment=hg38/genomic_benchmark \ callbacks.model_checkpoint_every_n_steps.every_n_train_steps=5000 \ dataset.dataset_name="dummy_mouse_enhancers_ensembl" \ dataset.train_val_split_seed=1 \ dataset.batch_size=128 \ dataset.rc_aug=false \ +dataset.conjoin_train=false \ +dataset.conjoin_test=false \ loader.num_workers=2 \ model=caduceus \ model.name=dna_embedding_caduceus \ +model.config_path="" \
+model.conjoin_test=false \
+decoder.conjoin_train=true \
+decoder.conjoin_test=false \
optimizer.lr="1e-3" \
trainer.max_epochs=10 \
train.pretrained_model_path="<path to .ckpt file>" \
wandb=null
ERROR:
![63a3b2e1c7a3bbe3a703caaff47a150](https://github.com/kuleshov-group/caduceus/assets/78602328/b473c74a-c92b-467f-b0e5-074ad414b70e)
Q2: RUN: python -m train \ experiment=hg38/hg38 \ callbacks.model_checkpoint_every_n_steps.every_n_train_steps=500 \ dataset.max_length=1024 \ dataset.batch_size=1024 \ dataset.mlm=true \ dataset.mlm_probability=0.15 \ dataset.rc_aug=false \ model=caduceus \ model.config.d_model=128 \ model.config.n_layer=4 \ model.config.bidirectional=true \ model.config.bidirectional_strategy=add \ model.config.bidirectional_weight_tie=true \ model.config.rcps=true \ optimizer.lr="8e-3" \ train.global_batch_size=8 \ trainer.max_steps=10000 \ +trainer.val_check_interval=10000 \ wandb=null ERROR:![Result](https://github.com/kuleshov-group/caduceus/assets/78602328/3a00f364-32c6-4893-84b8-c3f3a77f9b2b)