Closed martinez-zacharya closed 2 weeks ago
Hi, thanks for your question!
You are right, the special prefix tokens can be anything like <|YOUR_TOKEN|>
, and should typically be followed by a "1" or a "2" depending whether sequence is stored in the traditional N -> C -terminal direction or the reverse, same as in the original ProGen2 models. Also keep in mind that currently only one prefix token per sequence is supported. I updated the scripts to allow for arbitrary token, not just pfam tokens f089a3a. (See also simillar issue #4 )
Let me know if I can help in any other way!
Awesome, thank you for the helpful response!
Awesome work! I was wondering about specifics regarding the control tags:
Do they need to be pfam ID's or can they be any arbitrary string within two "|" at the beginning of the sequence? Like say if I had a group of proteins that don't have a pfam ID. Thank you!