tbepler / protein-sequence-embedding-iclr2019

Source code for "Learning protein sequence embeddings using information from structure" - ICLR 2019
Other
253 stars 75 forks source link

Generating Fasta/Text file for different dataset #26

Closed vsomnath closed 3 years ago

vsomnath commented 3 years ago

Hi, thank you for open sourcing this work.

We currently want to utilize your pretrained models for finetuning on a binding affinity dataset. Any advice (or an existing script) on how one can generate fasta files (in the format your models use) for a new dataset would be appreciated.

tbepler commented 3 years ago

Fasta is the "standard" sequence file format and it's pretty simple. Take a look here or here for examples.

The embedding script accepts files in the standard fasta format.