jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
https://doi.org/10.1093/bioinformatics/btab083
Apache License 2.0
578 stars 156 forks source link

Pre-train #83

Open morningsun77 opened 2 years ago

morningsun77 commented 2 years ago

Hi, I want to pre-train DNABERT with my own data,but I'm not aware of the template data at /example/sample_data/pre.Since the template data has no labels,I want to know if all the data in the template data are gene sequences. Thanks.

Eden0923 commented 10 months ago

You can find the format in the file DNABERT\examples\sample_data\pre. The text file '6_3k.txt'. You can see how they organized the input data.