Closed mcale6 closed 12 months ago
The generate_msa()
function is going to generate a new query MSA from an aligned MSA sampled from openfold, so in this case yes it would need the openfold dataset
If you want to generate using a custom MSA i would suggest the generate_query_oadm_msa_simple()
function
Does your comment mean we can "generate custom MSA's" from a given query sequence using generate_query_oadm_msa_simple()
function. I looked through the code on this function, to the best of my knowledge it looks like it generates a query sequence given a custom MSA, not the other way around.
Additional questions:
(1) Can we generate a custom MSA from query sequence using any existing functions in the code, or will we have to write our own?
(2) If you wanted to generate 1,000 different query sequences given a custom MSA, how would one go about doing that? I'm assuming just call generate_query_oadm_msa_simple()
many times, correct? And each time sample 64 different sequences from the custom MSA, correct?
This paper was also awesome, thanks for doing this Evodiff team!
It's certainly possible to generate an MSA given a query sequence, although as a caveat we don't evaluate this in our preprint. There is an example of doing this in generate_msa(start_query=True)
. Although I have not added the functionality to do this from a custom MSA, it is fairly straightforward.
To do this, all that is needed is to initiate an MSA with all-mask tokens, the first row should contain the query sequence of interest src[i][0][:seq_len] = query_sequences[i]
. Then, you can generate over the non-query rows x_indices = np.arange(0,1)
.
And for your second question, correct the best way is to sample the function many times. Each time should sample a unique subset of sequences from the MSA.
Thank you for using evodiff and the feedback!
When using MSA_OA_DM_MAXSUB() and generate_msa() I get the following error:
Do I need to have the Openfold datatset (not only the weights but also the MSA's?) in the evodiff/data/openfold folder?