Open seanrjohnson opened 1 year ago
I have been able to generate a custom MSA for one query protein with a prebuilt target database by:
mkdir -p tmp
mmseqs createdb [Query].faa [Query].db
mmseqs search [Query].db [Target].db [Query]x[Target].db ./tmp
mmseqs result2msa [Query].db [Target].db [Query]x[Target].db [Query]x[Target].a3m --msa-format-mode 5
colabfold_batch [Query]x[Target].a3m [Query]x[Target]_out
Probably, to build the target database, you need to use (untested):
mmseqs createdb [Target].fasta [Target].db
mmseqs createindex [Target].db tmp --remove-tmp-files 1
Hope it helps!
I'm trying to predict structures for a bunch of sequences from the same family. For this reason, I don't need to search against the entire uniprot30 or envdb. I just want to make a reference database from the sequences themselves (only a few thousand) and generate the MSAs from that search.
Can you recommend a way to do this?
With on a subset of my sequences of interest in
test_queries.fasta
(in this case, just two sequences, with names1
and2
). I have tried:I see the error:
Is there a generic version of the a3m pipeline that I can use with an arbitrary reference database?
I tried first with the mmseqs2 from conda. Then, thinking it might be some weird issue with the binaries, I downloaded the source and recompiled, but it didn't help.