instadeepai / nucleotide-transformer

🧬 Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics
https://www.biorxiv.org/content/10.1101/2023.01.11.523679v2
Other
480 stars 55 forks source link

Zero Shot Data? #69

Closed fma231 closed 4 months ago

fma231 commented 5 months ago

Hello

I am wondering if you could please provide one of the datasets used in predicting mutation effect during zero-shot inference. The paper mentions the use of ensembl variant effect predictor to get the corresponding effect of variants, but could you please provide the sequences used as those variants.

Thank you

JavierMenRev commented 4 months ago

Hi @fma231 , The FASTA files are quite big, but if you have a set of annotated variants by VEP you can run VCF2FASTA to get the sequences. Note that for our study we ONLY used SNPs and each sequence had a length of 6kb. Let me know if this helps. Javier