rcsb / mmtf

The specification of the MMTF format for biological structures
http://mmtf.rcsb.org/
44 stars 17 forks source link

Add ncsOperatorList field #2

Closed arose closed 8 years ago

arose commented 8 years ago

Add field ncsOperatorList with transformations to construct the full crystallographic asymmetric unit.

Example from 1a37:

[
    [
        1.0, 0.0, 0.0, 0.0,
        0.0, 1.0, 0.0, 0.0,
        0.0, 0.0, 1.0, 0.0,
        0.0, 0.0, 0.0, 1.0
    ],
    [
        -0.997443,  0.000760, -0.071468, 59.52120
        -0.000162, -0.999965, -0.008376, 80.32820
        -0.071472, -0.008343,  0.997408, 2.38680
         0.0,       0.0,       0.0,      1.0
    ]
]
BobHanson commented 8 years ago

Could you elaborate on this? How are these used? Jmol ignores these, I think.

arose commented 8 years ago

Example 1auy:

AU, that is the coordinates explicitly given in the file screenshot 72

trying to construct unitcell without NCS operations screenshot 74

AU expanded by NCS operations screenshot 73

Full biological assembly screenshot 71

Note that in mmCIF files the NCS operations are also in the "XAU" named _pdbx_struct_assembly_gen field.

arose commented 8 years ago

ping @josemduarte

BobHanson commented 8 years ago

Yes, I remember now. This is taken care of in Jmol as

load .... filter "biomolecule XAU"

On Mon, Apr 18, 2016 at 6:11 PM, Alexander Rose notifications@github.com wrote:

ping @josemduarte https://github.com/josemduarte

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/rcsb/mmtf/issues/2#issuecomment-211625372

Robert M. Hanson Larson-Anderson Professor of Chemistry Chair, Department of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

josemduarte commented 8 years ago

An interesting example is 1d2q. It does contain struct_ncs_oper different from an identity matrix, but only of the "given" code, which means the operator can be ignored (see struct_ncs docs in mmcif dictionary):

_struct_ncs_oper.id             1 
_struct_ncs_oper.code           given 
_struct_ncs_oper.details        ? 
_struct_ncs_oper.matrix[1][1]   -1.00000 
_struct_ncs_oper.matrix[1][2]   -0.00164 
_struct_ncs_oper.matrix[1][3]   0.00090 
_struct_ncs_oper.matrix[2][1]   -0.00162 
_struct_ncs_oper.matrix[2][2]   0.99988 
_struct_ncs_oper.matrix[2][3]   0.01547 
_struct_ncs_oper.matrix[3][1]   -0.00092 
_struct_ncs_oper.matrix[3][2]   0.01547 
_struct_ncs_oper.matrix[3][3]   -0.99988 
_struct_ncs_oper.vector[1]      32.84122 
_struct_ncs_oper.vector[2]      18.89500 
_struct_ncs_oper.vector[3]      -32.54992 

That operator seems to be a left over, which wasn't removed after the extra needed chain was added during remediation. Instead of removing it, they decided to assign it a "given" code...

Some more reading about this "interesting" topic:

BobHanson commented 8 years ago

I would like to suggest that this XAU be added to the "bioAssemblyList" along with the others, and that the transformList include "id" so that "1" "2" "3" "4" "PAU" and "XAU" are all available. Or whatever is necessary to reproduce this sort of construction:

loop_ _pdbx_struct_assembly_gen.assembly_id _pdbx_struct_assembly_gen.oper_expression _pdbx_struct_assembly_gen.asym_id_list 1 '(1-60)' A,B,C 2 1 A,B,C 3 '(1-5)' A,B,C 4 '(1,2,6,10,23,24)' A,B,C PAU P A,B,C XAU '(X0)(1-10,21-25)' A,B,C

Really, it seems to me, X0 itself is not necessary. I do not see the benefit of adding ncsOperatorList, as it, like all the other operations (which are also not listed anywhere, right?), are really just contributions to the biological assembly transformations. So the more useful thing to do is to create a bioAssemblyList item that is the set of transformations (X0)(1-10,21-25) just like all other operations.

Bob Hanson

BobHanson commented 8 years ago

ps -- just noting the statement above "try to depict" the unit cell. One thing to be clear about is that the depiction given there, also reproduced below IS the crystallized form. That is, the virus capsid protein does not crystallize in anything like its biological form. So "try" is not really the right word there. If one is interested in how the structure as solved in the crystal and how the pieces interact, then that is the right view to have. Anything else is basically a hypothesis relating to the actual biological form.

Q: Are there cases where the NCS operator X0 is important?

[image: Inline image 1]

On Mon, Apr 18, 2016 at 7:17 PM, Robert Hanson hansonr@stolaf.edu wrote:

I would like to suggest that this XAU be added to the "bioAssemblyList" along with the others, and that the transformList include "id" so that "1" "2" "3" "4" "PAU" and "XAU" are all available. Or whatever is necessary to reproduce this sort of construction:

loop_ _pdbx_struct_assembly_gen.assembly_id _pdbx_struct_assembly_gen.oper_expression _pdbx_struct_assembly_gen.asym_id_list 1 '(1-60)' A,B,C 2 1 A,B,C 3 '(1-5)' A,B,C 4 '(1,2,6,10,23,24)' A,B,C PAU P A,B,C XAU '(X0)(1-10,21-25)' A,B,C

Really, it seems to me, X0 itself is not necessary. I do not see the benefit of adding ncsOperatorList, as it, like all the other operations (which are also not listed anywhere, right?), are really just contributions to the biological assembly transformations. So the more useful thing to do is to create a bioAssemblyList item that is the set of transformations (X0)(1-10,21-25) just like all other operations.

Bob Hanson

Robert M. Hanson Larson-Anderson Professor of Chemistry Chair, Department of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

arose commented 8 years ago

With the try in "try to depict" the unit cell I meant to say that this is not the unitcell because I did not take into account the NCS operations. The image then show a packing that was way to loose to be a crystal. As far as I understand it you construct the unitcell by first applying the NCS operations (when available) to the AU (i.e. the coordinates written in the file) and secondly apply the operations for the given spacegroup to everything constructed in the first step. I maybe wrong.

arose commented 8 years ago

@BobHanson Good point. We currently list all transformations explicitly for each assembly, but it could indeed be useful to do it similar to how it is done in mmcif files. Especially constructs like 1m4x could be represented more concisely.

josemduarte commented 8 years ago

Note that in cases like 1d2q or 1a37 above, there are no XAU bio-assembly given in the files. This shows how the XAU notation was an ad-hoc solution introduced for virus capsid proteins and not used elsewhere.

As such I would vote against recycling the XAU solution for mmtf. We should simply use the extra field ncsOperatorList and represent the data properly: as extra data needed to generate the full crystal. Whilst bioAssembliesList remains a list of one or more interpretations of the possible assemblies present in that crystal (and thus should not include PAU and XAU assemblies, which are not interpretations).

BobHanson commented 8 years ago

Peter, I have a question for you: Is it possible that all the mmCIF files have been normalized so that X0 is always the identity matrix (when it is used in _pdbx_struct_assembly_gen.oper_expression)? I'm not finding any counter-example of that. I'm wondering if that was something that was done as part of the fixing of files some time ago.

Jose, I agree. PAU and XAU are specific to viral capsids. They are just subsets of the overall structure. As I understand it, XAU is simply a subset of the symmetries that happens to define a specific pentagonal face in this case. This is marginally useful. I suggest that it is perfect where it is in the mmCIF file. Jmol handles all these the same way, and it works great. There's nothing "special" about XAU other than that it doesn't have a number associated with its ID. Other than that, it is just one more sort of operation. So for 2TBV we have:

_pdbx_struct_assembly_gen.assembly_id _pdbx_struct_assembly_gen.oper_expression _pdbx_struct_assembly_gen.asym_id_list 1 '(1-60)' A,B,C,D,E,F,G,H,I 2 1 A,B,C,D,E,F,G,H,I 3 '(1-5)' A,B,C,D,E,F,G,H,I 4 '(1,2,6,10,23,24)' A,B,C,D,E,F,G,H,I PAU P A,B,C,D,E,F,G,H,I XAU '(X0)(1-5)' A,B,C,D,E,F,G,H,I

In all these cases, all Jmol needs is the list of 62 non-crystallographic operations -- 1-60, P, and X0. There is no need to duplicate all the matrices for each possible configuration of interest, the way it is done right now. "X0", "1", "2", ... "60" are just character labels pointing to _ pdbx_struct_oper_list.id. It seems to me keeping that simple reference in the mmtf files would be fine.

Here "XAU" is valuable because it produces one of the pentagonal faces -- created by operations 1-5 and then oriented by X0 (which is the Identity operation, actually). Similarly "4" is nice because it builds one hexagonal face. This would be a real pain to produce without this guide.

Alexander, I realize now that in these cases biomolecular assembly 1 is also the crystallographic assembly. I don't know if that is always true, but it is in these cases. To construct the unit cell, one needs to apply the four crystallographic operations, as such:

(s1 s2 s3 s4)(1-60)

I have never tried that, but I am going to try it now.

Bob

arose commented 8 years ago

Here is what I get when applying the NCS operations in the 1auy example:

unitcell screenshot 79

supercell (one unitcell in each direction) screenshot 78

BobHanson commented 8 years ago

http://chemapps.stolaf.edu/jmol/zip/jmol-14.5.4_2016.04.19.zip contains Jmol.jar, JmolData.jar, and JSmol JavaScript package for testing.

$ load =1auy.mmtf FILTER "biomolecule 1;.CA,.P" $ write image clipboard

Inline image 1 FileManager opening url http://mmtf.rcsb.org/full/1JGQ The Resolver thinks MMTF MMTF version 0.1 MMTF Producer RCSB-PDB Generator---version: ea7b9e5723271b3f7e46f8a7c5e1cbe18bba3ca2 The Path of Messenger RNA Through the Ribosome. THIS FILE, 1JGQ, CONTAINS THE 30S RIBOSOME SUBUNIT, THREE TRNA, AND MRNA MOLECULES. 50S RIBOSOME SUBUNIT IS IN THE FILE 1GIY id atoms bonds 1JGQ 8882 5503 1155 ms

$ t =now(); load =1auy.mmtf FILTER "biomolecule 1;.CA,.P";print now(t) $ write image clipboard 1jgq_mmtf_jmol

virus_jmol_mmtf 1auy_mmtf_jmol

Inline image 2 TURNIP YELLOW MOSAIC VIRUS 701 ms

$ t =now(); load =1auy.mmtf {2 2 1} FILTER "biomolecule 1;bychain";print now(t) $ write image clipboard Inline image 3 TURNIP YELLOW MOSAIC VIRUS 595 ms

BobHanson commented 8 years ago

Also, this is up and running in JavaScript at http://chemapps.stolaf.edu/jmol/jsmol/mmtf.htm

Tested only with FireFox/Windows 10. Could be issues because I have switch JSmol to using asynchronous data file loading since these are binary files.

BobHanson commented 8 years ago

This is a test to see if Jmol PNGJ formatted files can be drag-dropped into the Jmol applet or a JSmol JavaScript window on an web page. The image was create using write PNGJ "virus.png"

virus

BobHanson commented 8 years ago

ps... I have to tell you, I cannot tell at all what I am looking at with that 1auy supercell image. I'm pretty sure these are not capsids. They are sparse collections of faces. Or some sort of fractional capsids. Capsids look like the following. (Just showing this as a course-grained one-particle-per-chain view. Anyway, I think it looks nothing like that. Did Jmol make that image?

[image: Inline image 2]

On Tue, Apr 19, 2016 at 10:47 AM, Alexander Rose notifications@github.com wrote:

Here is what I get when applying the NCS operations in the 1auy example:

unitcell [image: screenshot 79] https://cloud.githubusercontent.com/assets/272250/14645504/c45753a2-060a-11e6-8ce9-1ba46ca57949.png

supercell (one unitcell in each direction) [image: screenshot 78] https://cloud.githubusercontent.com/assets/272250/14645503/c4575fdc-060a-11e6-88c0-abc9bda24ac7.png

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/rcsb/mmtf/issues/2#issuecomment-211988523

Robert M. Hanson Larson-Anderson Professor of Chemistry Chair, Department of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

arose commented 8 years ago

Which image are you refering to? The one below "trying to construct unitcell without NCS operations" or the one below "supercell (one unitcell in each direction)"? The former was wrong because it did not take the NCS operations into account. The images were made with NGL.

BobHanson commented 8 years ago

Sorry - I new the delay would cause confusion. The supercell one. I'm pretty sure I'm looking at the interior of capsids, right?

On Mon, Apr 25, 2016 at 10:38 AM, Alexander Rose notifications@github.com wrote:

Which image are you refering to? The one below "trying to construct unitcell without NCS operations" or the one below "supercell (one unitcell in each direction)"? The former was wrong because it did not take the NCS operations into account. The images were made with NGL http://arose.github.io/ngl/?load=rcsb://1auy.mmtf.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/rcsb/mmtf/issues/2#issuecomment-214400031

Robert M. Hanson Larson-Anderson Professor of Chemistry Chair, Department of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

arose commented 8 years ago

Yes, you're looking at the interior of capsids. This seems to be how the unitcell is defined (or how I apply the necessary transformations as I don't try to make the complexes as whole as possible).

Here is another perspective, showing that there are full capsid further inside:

screenshot 96

BobHanson commented 8 years ago

I must tell you that I can't make much of that. Here is what Jmol produces for the {1 1 1} cell for 1auy. Now, these are capsids. They are created by the command:

load =1auy.mmtf {1 1 1} filter "biomolecule 1;*.ca"

Which applies the full 60 non-crystallographic symmetries to get the capsid, then, after that, applies the space group operations of group 181 to place and orient those capsids in the unit cell.

[image: Inline image 1]

On Mon, Apr 25, 2016 at 1:28 PM, Alexander Rose notifications@github.com wrote:

Yes, you're looking at the interior of capsids. This seems to be how the unitcell is defined (or how I apply the necessary transformations as I don't try to make the complexes as whole as possible).

Here is another perspective, showing that there are full capsid further inside:

[image: screenshot 96] https://cloud.githubusercontent.com/assets/272250/14794419/d1c376ce-0ad8-11e6-9044-8d0a4825d325.png

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/rcsb/mmtf/issues/2#issuecomment-214471257

Robert M. Hanson Larson-Anderson Professor of Chemistry Chair, Department of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

arose commented 8 years ago

What do you mean by "60 non-crystallographic symmetries"? I mean the data that is in the struct_ncs_oper mmcif field. Note that this field is currently not included in the mmtf format, so I am loading cif in the examples with NGL. I apply struct_ncs_oper operations and then those from group P 64 2 2 (which I guess is 181). After that I shift the unitcell in each direction to create the supercell.

BobHanson commented 8 years ago

I agree that we need the struct_ncs_oper block and that is one way to do this. Please do add those.

I'm probably getting away with this because in these viral cases the biological unit is also in the crystal. (That's not always the case, right?) My picture is "prettier" because it organizes the visualization along the lines of the biologically relevant unit. So I suspect both of our representations are correct.

What's happening here is that there are three viral particles per unit cell. Each unit cell has 12 copies of the basic unit due to crystallographic operations. So that means that each operation contributes four units to each viral particle. Thus, in order to get 60 units per viral particle you need to add an additional 15 operations. These are your added NCS operations.

Do you know of some specific cases where the biological assembly is not actually present in the crystal structure?

It is annoying that the PDB file lists the crystallographic operations and the biomolecule operations (mixing crystallographic and noncrystallographic); the mmCIF file lists the noncrystallographic operations, but not the crystallographic ones; and the mmtf (as of now) lists the biomolecule operations only. None of them have the information in a standard CIF file that allows full construction of the actual crystal: a simple list of operators! But I think the PDB file comes closest, in that it does list the Jones-Faithful (x-1/2, y, z) notation for the space group, at least -- acknowledging the fact that one cannot just read a space group name and expect to get the operators right.

BobHanson commented 8 years ago

here's an image from Jmol in the orientation of the first "trying to construct unitcell without NCS operations" image in this thread. From it you can see (sort of) how those sparse units fill in to give full capsids. It's not a great perspective.

[image: Inline image 1] Here's one that shows how four basic crystallographic units fit in with the units on a single capsid: [image: Inline image 2]

So these capsids must be hiding somewhere in your image there.

arose commented 8 years ago

I agree that we need the struct_ncs_oper block and that is one way to do this. Please do add those.

We will!

My picture is "prettier" because it organizes the visualization along the lines of the biologically relevant unit.

Yeah, it is. I should do that to.

Do you know of some specific cases where the biological assembly is not actually present in the crystal structure?

@josemduarte you said no, right? At least it shouldn't.

@BobHanson the images you send to the github issue thread don't show up.

BobHanson commented 8 years ago

OK, I see. I have to come here and enter my message, not do that from a gmail replay.

1auy_111_dual

1auy_111_dual2

For Jmol I have added the NCS operations for when there is no request for a biomolecule but there is a request for a unit cell:

load =1auy.cif {1 1 1}

but not

load =1auy.cif {1 1 1} filter "biomolecule 1;*.CA"

Result is a bit ragged.

1auy_111_ncs

Thanks very much for clearing up for me the use of NCS operations.

josemduarte commented 8 years ago

To be clear: in the current definition the biological assembly must be a part of the crystal. The bio assembly annotation is simply an interpretation of what is a plausible assembly in the crystal. Usually that interpretation should come from independent experimental data confirming the oligomeric state in solution.

BobHanson commented 8 years ago

Good to hear. Thank you.

BobHanson commented 8 years ago

Is that enhancement pushed out to http://mmtf.rcsb.org ?

arose commented 8 years ago

No, that will take some time, while we are gathering feedback. We will let you know!

josemduarte commented 8 years ago

This has been done in 0.2 already, hasn't it? Shall we close?

arose commented 8 years ago

documented in spec https://github.com/rcsb/mmtf/blob/master/spec.md#ncsoperatorlist