steineggerlab / foldmason

Multiple Protein Structure Alignment at Scale with FoldMason
https://search.foldseek.com/foldmason
GNU General Public License v3.0
116 stars 10 forks source link

Request: Access to column LDDT values #4

Open hughhigin opened 4 months ago

hughhigin commented 4 months ago

The .html output of msa2lddtreport includes column values of LDDT for a given alignment, but from the file or the .json you can download I haven't found a good way to pull them for analysis. Can this be an optional output file, or be calculable from another tool?

One note is that for our larger networks I am able to calculate the alignments with the recent improvements but msa2lddt segfaults and fails, possibly due to a lack of RAM (I am working with just 32GB). If there were a way to calculate it directly rather than going through the .html report function that would be super helpful!

gamcil commented 4 months ago

This is on the to-do list at the moment. The easiest way to extract the per-column scores would be using grep on the html file, e.g.:

grep -Eo '"scores": \[(.*)\]' foldmason.html

Unfortunately the current implementation of msa2lddt scales pretty poorly since it looks at every possible pair of sequences within the MSA. Improving this (+ exporting the underlying data) is also on the todo list for now.