Closed JiayiJennie closed 3 months ago
Hi, these two datasets should be just alternative versions of one-another. The pretraining dataset however is the pretraining_dataset.zip
(final_model_dataset.zip
was used for fine-tuning of the final model).
Hello! I want to reproduce your repo and found there are "final_model_data_seperate_fasta_per_superkingdom" and "pretraining_dataset" listed in your database. The latter is the subset of the first. Which one is used to pretrian in your paper? Thanks!