Closed leannmlindsey closed 2 months ago
I noticed in genomic_benchmar.yaml that you have various train_len for each task. I was just wondering why there were such wide differences in the training lengths? Is it just the maximum value in the dataset?
example: dummy_mouse_enhancers_ensembl: train_len: 1210 classes: 2 max_length: 1024 demo_coding_vs_intergenomic_seqs: train_len: 100_000 classes: 2 max_length: 200 demo_human_or_worm: train_len: 100_000 classes: 2 max_length: 200 human_enhancers_cohn: train_len: 27791 classes: 2 max_length: 500 human_enhancers_ensembl: train_len: 154842 classes: 2 max_length: 512 human_ensembl_regulatory: train_len: 289061 classes: 3 max_length: 512 human_nontata_promoters: train_len: 36131 classes: 2 max_length: 251 human_ocr_ensembl: train_len: 174756 classes: 2 max_length: 512 phage_classification: train_len: 4000 classes: 2 max_length: 4000
Sorry, I realized from looking at the config files that the training length was the number of sequences in the dataset.
I noticed in genomic_benchmar.yaml that you have various train_len for each task. I was just wondering why there were such wide differences in the training lengths? Is it just the maximum value in the dataset?
example: dummy_mouse_enhancers_ensembl: train_len: 1210 classes: 2 max_length: 1024 demo_coding_vs_intergenomic_seqs: train_len: 100_000 classes: 2 max_length: 200 demo_human_or_worm: train_len: 100_000 classes: 2 max_length: 200 human_enhancers_cohn: train_len: 27791 classes: 2 max_length: 500 human_enhancers_ensembl: train_len: 154842 classes: 2 max_length: 512 human_ensembl_regulatory: train_len: 289061 classes: 3 max_length: 512 human_nontata_promoters: train_len: 36131 classes: 2 max_length: 251 human_ocr_ensembl: train_len: 174756 classes: 2 max_length: 512 phage_classification: train_len: 4000 classes: 2 max_length: 4000