nglviewer / ngl

WebGL protein viewer
http://nglviewer.org/ngl/
MIT License
662 stars 167 forks source link

PDB shown through mdanalysis but not show_file #791

Open sperezconesa opened 3 years ago

sperezconesa commented 3 years ago

Hello, I'm having a problem visualizing a pdb: image As you can see the file is visible through an mdanalysis universe but with show_file. I have attached the pdb in case this help. Let me know which other information you need. Thank you very much! Sergio

sperezconesa commented 3 years ago

nv.show_structure_file does seem to work.

hainm commented 3 years ago

@sperezconesa Can you try with different pbb files (e.g: https://github.com/nglviewer/nglview/tree/master/nglview/datafiles)

I don't see your attached pdb file.

PS: For nglview issue, please report in nglview repo: https://github.com/nglviewer/nglview/issues

sperezconesa commented 3 years ago

The other files seem to work so it must be a problem in my pdb file, which is weird it works with mda and vmd too.

My bad. I'll attach it. closed.pdb.gz

I didn't know there was a difference between nglview and ngl. Which is the difference? Sorry about that.

hainm commented 3 years ago

nglview brings ngl to python and jupyter ecosystem.

sperezconesa commented 3 years ago

Ah good to know. This will make my life easier to search for info.

In addition, when I do visualize it with show_structure_file I only see 2 of the subunits: image

hainm commented 3 years ago

nv.show_file works well for me.

Screen Shot 2020-09-23 at 11 13 25 AM
hainm commented 3 years ago

By the way, you can use view.gui_style = 'ngl' to bring the GUI to the notebook.

In addition, when I do visualize it with show_structure_file I only see 2 of the subunits:

Under the hood, nglview use MDA to convert its Universe to pdb file. So the output pdb might be different from your pdb file (MDA might do something extra).

hainm commented 3 years ago

GUI

Screen Shot 2020-09-23 at 11 24 19 AM
sperezconesa commented 3 years ago

The broken chain seems to be because of the PROA, PROB, PROC, PROD segname that confuses either MDAnalysis or nglview.

Could it be a versions issue. These are the versions of the environment:

ipython                   7.18.1           py38h5ca1d4c_0  
ipython_genutils          0.2.0                    py38_0  
ipywidgets                7.5.1              pyh9f0ad1d_1    conda-forge
mdanalysis                1.0.0            py38h950e882_0    conda-forge
nglview                   2.7.7              pyh5ca1d4c_1    conda-forge
nodejs                    14.11.0              h568c755_0    conda-forge
notebook                  6.1.1                    py38_0  
python                    3.8.5           h1103e12_8_cpython    conda-forge

And of the jupyterlab:

jupyter_core              4.6.3                    py38_0  
jupyterlab                2.2.6                      py_0  
jupyterlab-git            0.21.1                   pypi_0    pypi
jupyterlab_server         1.2.0                      py_0  

And of nglview

nglview-js-widgets v2.7.7  

I have tried to follow this as much as possible since I have had a lot of problems with instalation in the past.

fredludlow commented 3 years ago

This reproduces in NGL (so not an nglview issue). You can open your file in the NGL web app: http://nglviewer.org/ngl/ (File menu, open)

If you add a point representation you can see that all the coords get parsed properly If you add a line representation it'll give you some clue what's going on..

Looking at the file, specifically the first atom:

ATOM      1  CAY TRP P  26      27.330  32.890  30.570  0.00  0.25       PROA C 

And line 731:

ATOM    730  CAY TRP P  26      48.010  44.010  30.110  0.00  0.25       PROB C 

I'm not totally sure of the internals but I suspect NGL is going back and adding extra atoms to residue TRP 26 from chain P. This confuses the bond assignment code (hence the messed up line repr) and when a cartoon is drawn it'll look for backbone atoms by name (ignoring duplicates).

What software did you use to make the file? (If this variation on the PDB spec is a de-facto standard within a particular research community we can see if we can make NGL parse it? - though the repeated combination of atom-name and residue no is likely to cause a lot of trouble!)

fredludlow commented 3 years ago

I've just noticed/remembered your other issue #732, which touches on this too (and would definitely be nice to have if someone wants to write the code!).

The internal way NGL stores structures makes assumptions about not having two atoms with the same chain, resname, resnumber etc. so just adding the SEGID etc to the atom (or residue) wouldn't necessarily be enough. However - you could maybe use segids as something like the altLoc column? (You can specify alternative locations for atoms when you disorder in the structure). Then you'd already have the selection syntax (e.g. '%A'). This could possibly be a special parser mode, or turned on if we detect segnams/segids and no altloc? (It would need to understand how segnames/segids are used to make sure this could actually work - altLoc is currently a single character)

sperezconesa commented 3 years ago

I would really love to contribute to this project and many others but unfortunatelly I don't have the time nor the expertise at this moment. Perhaps there is no need for a full implementation but a Warning would be good enough I think.

sperezconesa commented 3 years ago

By the way, does the segname problem explain both phenomena? I am not sure if I understood.

fredludlow commented 3 years ago

By the way, does the segname problem explain both phenomena? I am not sure if I understood.

Sort of - the segnames are currently ignored by NGL. For the structure you're looking at you could probably fix the display (and give yourself a way to select by segname) just by making the chain names different for each segname (they're currently all 'P' for the protein - these could be come A for PROA, B for PROB), or introducing an altLoc character (e.g. all PROA atoms stay in chain P but are given altLoc A, all PROB get altLoc B etc).

(The disadvantage of that is there are only so many characters available for chains and altLocs... so if you have a huge system with loads of distinct segnames you might run out of chain/altLoc identiiers?)

sperezconesa commented 3 years ago

The reason for this weird format is that mdanalysis takes a pdb with chain A and segid PROA and writes it out as chain P and segname PROA because it doesn't distinguish between chain and segid. I have notified them of this too.