Open hughhigin opened 5 months ago
This is on the to-do list at the moment. The easiest way to extract the per-column scores would be using grep on the html file, e.g.:
grep -Eo '"scores": \[(.*)\]' foldmason.html
Unfortunately the current implementation of msa2lddt scales pretty poorly since it looks at every possible pair of sequences within the MSA. Improving this (+ exporting the underlying data) is also on the todo list for now.
Hi, amazing job on this tool.
For anyone who needs this, here is a 1-liner pipe for getting this data. Requirements: running --report-mode 2
and local installation of jq :
jq -r '.scores[]' output.json | awk 'BEGIN {print "Residue_position\tlDDT_score"} {print NR "\t" $0}' > lDDT_per_residue.tsv
In my opinion fine-grained analysis on MSA ensembles during refinement step can be useful: it will be interesting to see how total length of alignment is changing, next step will be identify stable regions in alignment , and regions where alignment changed after optimization, which LDDTs it has before and after refinement.
for example, make MSA on profiles from each step , it should not be difficult if corresponding results will be preserved : it can be "debug" mode.
The .html output of msa2lddtreport includes column values of LDDT for a given alignment, but from the file or the .json you can download I haven't found a good way to pull them for analysis. Can this be an optional output file, or be calculable from another tool?
One note is that for our larger networks I am able to calculate the alignments with the recent improvements but msa2lddt segfaults and fails, possibly due to a lack of RAM (I am working with just 32GB). If there were a way to calculate it directly rather than going through the .html report function that would be super helpful!