sacdallago / bio_embeddings

Get protein embeddings from protein sequences
http://docs.bioembeddings.com
MIT License
463 stars 65 forks source link

Can Word2Vec be used for 4, 5 and 6 kmer? If possible, which file I need to changed and which parameter. I am seeking Guidance on Adapting Word2Vec Code for 4kmer Sequences #243

Open faruk17035 opened 8 months ago

faruk17035 commented 8 months ago

Dear Sir, I hope this message finds you well. I am currently working on a project involving 4kmer sequences, and I have been using the bio-embeddings package, specifically the Word2Vec implementation, to process 3kmer sequences successfully.

However, I now need to modify the code to handle 4kmer sequences. I understand that this involves adjusting various parameters and possibly updating the code to accommodate the differences in sequence length. To facilitate this adaptation, I am seeking your guidance on the specific changes that need to be made in the bio-embeddings Word2Vec code.

Could you kindly provide insights into the files or sections of the code that require modification for handling 4kmer sequences effectively? Additionally, if there are specific parameters or functions that need adjustment, I would greatly appreciate any guidance you can offer.

Your expertise in this area is invaluable to me, and I believe your insights will significantly expedite the process of adapting the Word2Vec code for 4kmer sequences. If there are any relevant documentation or resources that you could point me to, that would be immensely helpful as well.

Thank you very much for your time and assistance. I look forward to your guidance on this matter.