steineggerlab / foldmason

Multiple Protein Structure Alignment at Scale with FoldMason
https://search.foldseek.com/foldmason
GNU General Public License v3.0
136 stars 13 forks source link

Request: Access to column LDDT values #4

Open hughhigin opened 5 months ago

hughhigin commented 5 months ago

The .html output of msa2lddtreport includes column values of LDDT for a given alignment, but from the file or the .json you can download I haven't found a good way to pull them for analysis. Can this be an optional output file, or be calculable from another tool?

One note is that for our larger networks I am able to calculate the alignments with the recent improvements but msa2lddt segfaults and fails, possibly due to a lack of RAM (I am working with just 32GB). If there were a way to calculate it directly rather than going through the .html report function that would be super helpful!

gamcil commented 5 months ago

This is on the to-do list at the moment. The easiest way to extract the per-column scores would be using grep on the html file, e.g.:

grep -Eo '"scores": \[(.*)\]' foldmason.html

Unfortunately the current implementation of msa2lddt scales pretty poorly since it looks at every possible pair of sequences within the MSA. Improving this (+ exporting the underlying data) is also on the todo list for now.

bananabenana commented 1 month ago

Hi, amazing job on this tool.

For anyone who needs this, here is a 1-liner pipe for getting this data. Requirements: running --report-mode 2 and local installation of jq :

jq -r '.scores[]' output.json | awk 'BEGIN {print "Residue_position\tlDDT_score"} {print NR "\t" $0}' > lDDT_per_residue.tsv
igortru commented 1 month ago

In my opinion fine-grained analysis on MSA ensembles during refinement step can be useful: it will be interesting to see how total length of alignment is changing, next step will be identify stable regions in alignment , and regions where alignment changed after optimization, which LDDTs it has before and after refinement.

for example, make MSA on profiles from each step , it should not be difficult if corresponding results will be preserved : it can be "debug" mode.