Closed ziky90 closed 7 years ago
I would love to see this. (I did it myself with another library as part of my research. It gave very poor results unfortunately).
I wanted to do this myself but ran into trouble parsing the protein sequences into characters in a way amenable to Gensim. I'd love to help or learn how it's done, it would be very useful for a paper I had in mind. (Full credit ofc).
Sounds good :) What specific issues did you run into?
Data munging, efficiently extracting from uniprot's file formats (fasta or xml), Using pure existing vectors , Ngrams vs unigrams x And understanding general usage of gensim, e.g. "sentence labels".
(I haven't used the package before, only read your blog posts prior to that , and this was a while ago) On May 2, 2016 2:22 PM, "Radim Řehůřek" notifications@github.com wrote:
Sounds good :) What specific issues did you run into?
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/piskvorky/gensim/issues/645#issuecomment-216208683
I closed this because original PR was abandoned.
I have moved discussion about the doc2vec / word2vec ipython example from https://github.com/piskvorky/gensim/issues/629 as it was suggested by @Piezoid.
ideas: