rnajena / bertax_training

Training scripts for BERTax
8 stars 4 forks source link

pretrain data clarification #15

Closed JiayiJennie closed 3 months ago

JiayiJennie commented 3 months ago

Hello! I want to reproduce your repo and found there are "final_model_data_seperate_fasta_per_superkingdom" and "pretraining_dataset" listed in your database. The latter is the subset of the first. Which one is used to pretrian in your paper? Thanks!

Screenshot 2024-06-25 at 9 17 43 AM
f-kretschmer commented 3 months ago

Hi, these two datasets should be just alternative versions of one-another. The pretraining dataset however is the pretraining_dataset.zip (final_model_dataset.zip was used for fine-tuning of the final model).