pnpnpn / dna2vec

dna2vec: Consistent vector representations of variable-length k-mers
MIT License
182 stars 60 forks source link

Pretrained set? #2

Closed stanleyjs closed 7 years ago

stanleyjs commented 7 years ago

Hi, What genome/sequence was the pretraining set done on? Can you make this available? I am running some initial experiments and would rather not lose time to training dna2vec for my proof of concept.

Thank you!

pnpnpn commented 7 years ago

This is exactly the instructions of https://github.com/pnpnpn/dna2vec#training-dna2vec-embeddings. Let me know if you run into issues with it.

stanleyjs commented 7 years ago

I did run into a few issues getting training to work, but my server is down so I will restart the training tonight. However, I was more referencing pretrained/dna2vec-20161219-0153-k3to8-100d-10c-29320Mbp-sliding-Xat.w2v from https://github.com/pnpnpn/dna2vec#reading-pretrained-dna2vec

Is this included pretrained pretrained/dna2vec-20161219-0153-k3to8-100d-10c-29320Mbp-sliding-Xat.w2v file the hg38 genome?

pnpnpn commented 7 years ago

Yes, it is. That pretrained w2v was trained from hg38 genome with the config yaml file.