Open leannmlindsey opened 2 months ago
Hello, we are not publishing the training codes at this stage. You may nevertheless find the following info useful: All training was done with PyTorch. We indeed trained the DNABERT-2 model using the architecture from https://huggingface.co/zhihan1996/DNABERT-2-117M/blob/main/bert_layers.py and the same learning rate scheduler as in the original DNABERT-2 paper. We have recently retrained all models on 10 GPUs (NVIDIA A100 80Gb) with the effective batch size of 4480 for DNABERT-2 and 480 for NTv2-250M-3UTR. As for DNABERT-2, the training time was 1.3 h (which is about 10x less compared to NT) when training on Zoonomia 3'UTR sequences for 2 epochs. This is equivalent to 13 hours when training on a single A100 GPU. DNABERT-2 is also faster than NT when compared on the same batch size.
Hello! I was wondering if you would release your pretraining code for DNABERT-2 and NT? The DNABERT-2 website does not release the actual code that they used to pre-train, just a suggestion of two similar models to use.
(from DNABERT-2 website) We used and slightly modified the MosaicBERT implementation for DNABERT-2 https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert . You should be able to replicate the model training following the instructions.
Or you can use the run_mlm.py at https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling by importing the BertModelForMaskedLM from https://huggingface.co/zhihan1996/DNABERT-2-117M/blob/main/bert_layers.py. It should produce a very similar model.
I am interested in using your implementation of the pre-training DNABERT-2 because you were able to get it to train in such a short time.
Thank you for any help you can provide. LeAnn