Open mgarort opened 7 months ago
Dear PepLand team,
Thanks once again for your great paper and model!
I am trying to apply the pretrained model to my own peptides to obtain embeddings. However, I get the error message
Warning Unfound fragments
, which (I believe) means my peptides contain fragments/tokens not represented in PepLand's pretraining set, and therefore not represented in the vocabulary.Could you please let me know how to train PepLand from scratch on my own training set with my own peptides, including tokenization, so that it considers all possible fragments in my peptides? The training section of the README specifies some of the scripts to be used, but I am unsure if this includes the tokenization step.
Thanks in advance.
Thank you very much for your interest in our work!
If possible, could you share a small sample of data that is causing errors in the program? This will help me identify where the issue lies and debug it effectively.
I have been preparing for my graduation defense recently, so I apologize for any delayed response in addressing this issue.
Thanks, Richard.
Hi Richard,
Thanks a lot for your response and best of luck with your graduation defense!
Here is a sample of 5 smiles, some of which trigger the error message described. They are the smiles of 5 approved therapeutic peptides.
Hi Richard,
Thanks a lot for your response and best of luck with your graduation defense!
Here is a sample of 5 smiles, some of which trigger the error message described. They are the smiles of 5 approved therapeutic peptides.
Hello, I apologize for the delayed response.
I have added an examples folder. Specifically, in examples/models/pepland/inference.py
, I have encapsulated two models:
You can utilize examples/main.py
to test these two types of models. I will provide a complete fine-tuning example in the future.
I also tested the examples you provided, and indeed, there were some warnings because some fragments in your examples are not in my vocab table. However, the program can handle this out-of-vocabulary situation, so it can still generate the embedding.
Hi Richard,
Thanks for your reply. Indeed, I was able to create embeddings a few weeks ago. The issue is that the unrecognized fragments are very important to our dataset, so I would like to re-train PepLand (including recreating the vocabulary) so that the model can consider those fragments explicitly.
Is it possible to re-train PepLand from scratch on my peptides, including new tokenization / recreation of the vocabulary, so that all fragments in my dataset are recognized?
Dear PepLand team,
Thanks once again for your great paper and model!
I am trying to apply the pretrained model to my own peptides to obtain embeddings. However, I get the error message
Warning Unfound fragments
, which (I believe) means my peptides contain fragments/tokens not represented in PepLand's pretraining set, and therefore not represented in the vocabulary.Could you please let me know how to train PepLand from scratch on my own training set with my own peptides, including tokenization, so that it considers all possible fragments in my peptides? The training section of the README specifies some of the scripts to be used, but I am unsure if this includes the tokenization step.
Thanks in advance.