Closed rubenalv closed 1 week ago
Hello,
For the 'query-balls-clashes' command, the input should be produced by 'get-balls-from-atoms-file' with '--annotated' flag. So, the pipeline may look like this:
voronota get-balls-from-atoms-file --annotated < 8WMA.pdb > balls.txt
voronota query-balls-clashes --clash-distance 3.0 < balls.txt > clashes.txt
the 'clashes.txt' will look like this:
c<A>r<190>a<15>R<GLN>A<O> c<A>r<203>a<99>R<LYS>A<N> 2.90538 -0.164624
c<A>r<192>a<29>R<ILE>A<N> c<A>r<201>a<90>R<VAL>A<O> 2.87901 -0.190985
c<A>r<192>a<32>R<ILE>A<O> c<A>r<201>a<87>R<VAL>A<N> 2.88789 -0.18211
c<A>r<194>a<48>R<VAL>A<N> c<A>r<199>a<79>R<ASN>A<O> 2.85436 -0.215637
but it can be made more table-like with the 'expand-descriptors' command, which expands every annotation token into space separated values: chainID resSeq iCode serial altLoc resName name.
cat clashes.txt | voronota expand-descriptors > table.txt
The ' table.txt will look like this:
A 145 . 1 . SER N A 147 . 17 . GLN N 5.43134 2.33134
A 145 . 1 . SER N A 148 . 29 . ALA N 5.95731 2.85731
A 145 . 1 . SER N A 148 . 33 . ALA CB 5.99677 2.74677
A 145 . 1 . SER N A 149 . 40 . GLU CG 5.81173 2.56173
A 145 . 1 . SER N A 149 . 41 . GLU CD 5.81399 2.56399
You could filter it for inter-chain clashes by only selecting rows where values in columns 1 and 8 differ. However, there is a trick, you can specify minimum residue sequence separation to be very large (e.g 99999), so that only inter-chain pairs satisfy it. The the full pipeline can look like this:
cat 8wma.pdb \
| voronota get-balls-from-atoms-file \
--annotated \
--include-heteroatoms \
| voronota query-balls-clashes \
--min-seq-sep 99999 \
--clash-distance 3.0 \
| voronota expand-descriptors \
> results.txt
Then the 'results.txt' for 8WMA will contain:
A 429 . 1824 . LYS NZ C 446 . 2695 . LYS O 2.68049 -0.389506
That is super-useful, many thanks!
Regarding the last two lines: Then the 'results.txt' for 8WMA will contain: A 429 . 1824 . LYS NZ C 446 . 2695 . LYS O 2.68049 -0.389506 From the docs I understood that the last two columns in an annotation are the contact area and the distance between the centers of the two atoms. If it is so, how do I interpret the distance being negative (-0.389)?
I run a few tests:
cat 8WMA.pdb | voronota get-balls-from-atoms-file --annotated --include-heteroatoms | voronota query-balls-clashes --min-seq-sep 99999 --clash-distance 2.6 | voronota expand-descriptors
# none
cat 8WMA.pdb | voronota get-balls-from-atoms-file --annotated --include-heteroatoms | voronota query-balls-clashes --min-seq-sep 99999 --clash-distance 2.68 | voronota expand-descriptors
# none
cat 8WMA.pdb | voronota get-balls-from-atoms-file --annotated --include-heteroatoms | voronota query-balls-clashes --min-seq-sep 99999 --clash-distance 2.6805 | voronota expand-descriptors
A 429 . 1824 . LYS NZ C 446 . 2695 . LYS O 2.68049 -0.389506
It looks like --clash distance
filters on the value of what I thought was the area?
Also, does the clashing function work on the centers of atoms, or does it consider the surface of the proteins? The reason I am asking is that I have tried simple python functions to calculate the distance, and they work as long as the protein volumes do not intersect. I have prediction cases where the proteins intersect and are not detected by the distance function: eg say we have two protein rings of identical radius that is longer than the clash threshold -if the surface of each ring crosses the center of the other one (and the ring planes are perpendicular), which would not make sense in a protein:protein interaction, no clash is detected using just the distance.
Hello,
No, 'query-balls-clashes' is just for distance-based clashes, not for tessellation-derived contact areas!
Running
voronota query-balls-clashes --help
gives
--clash-distance number clash distance threshold in angstroms, default is 3.0
--min-seq-sep number minimum residue sequence separation, default is 2
--init-radius-for-BSH number initial radius for bounding sphere hierarchy
--help flag to print usage help to stdout and exit
stdin <- list of balls (line format: 'annotation x y z r')
stdout -> list of clashes (line format: 'annotation1 annotation2 distance min-distance-between-balls')
So, the output line format is: 'annotation1 annotation2 distance min-distance-between-balls'. The annotation1 annotation2 are expanded after 'voronota expand-descriptors', but the last two values are in any case 'distance' and 'min-distance-between-balls'.
If you want to use tessellation derived contacts, you will need to first construct them (https://www.voronota.com/#computing-annotated-inter-atom-contacts), then you can filter them (https://www.voronota.com/#querying-annotated-inter-atom-contacts). A relatively comfortable way to define a construct+query pipeline is using https://bioinformatics.lt/wtsam/vorocontacts page, where you can define a query and then download the script.
In general, I think, constructing tessellation-based contacts may be an overkill for just identifying extreme unrealistic clashes, so 'query-balls-clashes' may be enough.
If you still want contact areas, but computed much faster than with the OG Voronota, check out Voronota-LT "https://www.voronota.com/expansion_lt/", that also has somehow more sane interface.
For example:
voronota-lt -i 8wma.pdb --compute-only-inter-chain-contacts --write-contacts-to-file contacts.tsv
will produce a table in 'contacts.tsv' that contains Voronoi tessellation-based contacts and looks like this:
ca_header ID1_chain ID1_residue ID1_atom ID2_chain ID2_residue ID2_atom ID1_index ID2_index area arc_legth distance
ca A 327|TRP CE3 C 445|LEU CG 1045 2687 0.00469616 0 5.42041
ca A 327|TRP CE3 C 445|LEU CD2 1045 2689 0.862091 0.561291 5.5653
ca A 327|TRP CZ3 C 445|LEU CG 1047 2687 9.1327 0.645198 4.62657
ca A 327|TRP CZ3 C 445|LEU CD2 1047 2689 2.43678 0.641754 5.03981
ca A 327|TRP CZ3 C 445|LEU CD1 1047 2688 1.0007 0.886454 5.09493
ca A 327|TRP CZ3 C 445|LEU CA 1047 2683 0.356817 0.614441 5.77186
ca A 327|TRP CH2 C 445|LEU CA 1048 2683 0.397446 0.737037 6.05122
ca A 330|ALA O C 444|TRP CZ2 1071 2679 3.49465 0 4.10385
ca A 331|ALA O C 444|TRP CZ2 1076 2679 1.79317 0 4.68959
I've given myself a scold for not reading the documentation well (twice). I am at the protocol development stage, and since I am new to the field even knowing exactly what I need to solve the problems takes time... sifting through docs backfires then! So I really appreciate the time you have taken to answer. The goal is to determine whether two proteins are likely to interact (when e.g. one protein is a mutant, so it is not registered in a database like IntAct), starting from proteome scale, so I am starting with fast and less accurate methods to progressively reduce search space to use slower and more accurate approaches.
So yes, thank you very much for the help and examples! I can close the (no) issue as solved now.
Hello,
I am trying to use voronota to calculate clashes between chains, as a filtering method to distinguish between potential OK and definitely bad protein:protein predicted models. I am getting an "incomplete bracketing" error.
Example code:
Other commands yield output as expected
Am I not using the
query-balls-clashes
command correctly?