scipion-em / scipion-em-continuousflex

Plugin for continuous conformational flexibility analysis containing HEMNMA, StructMap, HEMNMA-3D, TomoFlow, NMMD, and DeepHEMNMA for in vitro and in situ cryo-EM/ET.
GNU General Public License v3.0
6 stars 2 forks source link

Replace all usages of mrcfile and spider_file3 by ImageHandler #107

Closed MohamadHarastani closed 1 year ago

MohamadHarastani commented 2 years ago

It works as follows

from pwem.emlib.image import ImageHandler
volume_data = ImageHandler().read(volume_filename).getData()
MohamadHarastani commented 2 years ago

Eventually, remove dependency on mrcfile

MohamadHarastani commented 2 years ago

Maybe replace reading and writing PDBs with the usage of AtomicStructHandler from pwem.convert.atom_struct import AtomicStructHandler

mms29 commented 2 years ago

Unfortunately, Imagehandler do not save header information when writing to file as it should be (samplign rate and origin axis) and I need these to be in the header for the input of GENESIS. For instance, for the sampling rate when I open a volume with sampling rate 2.0A :

file = "/home/guest/Workspace/test.mrc"
volume = ImageHandler().read(file)

I obtain an Image object with :

Sampling rate : X-rate (Angstrom/pixel) = 2 Y-rate (Angstrom/pixel) = 2 Z-rate (Angstrom/pixel) = 2

Then, I save it and open it again :

volume.write(file)
volume = ImageHandler().read(file)

Sampling rate : X-rate (Angstrom/pixel) = 1 Y-rate (Angstrom/pixel) = 1 Z-rate (Angstrom/pixel) = 1

The same thing appends for the origin axis. That is why for the moment I need mrcfile to wright these two parameters in the header.

MohamadHarastani commented 2 years ago

I see. Keep mrcfile of course. I will replace the usages of spiderfile3 since I am handling the sampling rate using scipion objects attributes and not using the header. We can open an issue asking Scipion to allow setting the true sampling rate (at least when we pass a flag) since we have a motive: connecting with other packages.

On Tue, Apr 12, 2022, 9:52 AM Rémi Vuillemot @.***> wrote:

Unfortunately, Imagehandler do not save header information when writing to file as it should be (samplign rate and origin axis) and I need these to be in the header for the input of GENESIS. For instance, for the sampling rate when I open a volume with sampling rate 2.0A :

file = "/home/guest/Workspace/test.mrc" volume = ImageHandler().read(file)

I obtain an Image object with :

Sampling rate : X-rate (Angstrom/pixel) = 2 Y-rate (Angstrom/pixel) = 2 Z-rate (Angstrom/pixel) = 2

Then, I save it and open it again :

volume.write(file) volume = ImageHandler().read(file)

Sampling rate : X-rate (Angstrom/pixel) = 1 Y-rate (Angstrom/pixel) = 1 Z-rate (Angstrom/pixel) = 1

The same thing appends for the origin axis. That is why for the moment I need mrcfile to wright these two parameters in the header.

— Reply to this email directly, view it on GitHub https://github.com/scipion-em/scipion-em-continuousflex/issues/107#issuecomment-1096290806, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK2I5FOI467SFKXN4KGA6XDVEUTSTANCNFSM5SKVVHWQ . You are receiving this because you were assigned.Message ID: @.***>

mms29 commented 2 years ago

Concerning AtomicStructHandler, I have an issue when opening large PDB like the ribosome (3j77 ) caused by a too large number of different chains. PDB format allows only one character to define the chain name, consequently, the max number of chains is 36 (0-9 and A-Z), however, the ribosome has more than 80 chains.

One solution could be to read this file as CIF, however, at one point I need to convert to PDB to run GENESIS.

Another solution is to use the "segment ID" field of PDB format which is done in VMD to define the segments of the structure, which allows 3 or more characters I believe. This is what I do for the moment with my own PDB handler. however BioPython is based on these chains and could not be changed...

The best for now might be to keep the PDB handler I wrote (I compared my IO parser and the one of BioPython and they are both the same), I implemented some of the function used in AtomicStructHandler for instance the one to align 2 PDBs rigid body wise. It also has several features that are very convenient (for instance matching 2 different structures to find RMSD). I prepared a clean version of the handler named ContinuousFlexPDBHandler which is available in protocols.utilities.pdb_handler.py

MohamadHarastani commented 2 years ago

Perfect

On Tue, Apr 12, 2022, 10:14 AM Rémi Vuillemot @.***> wrote:

Concerning AtomicStructHandler, I have an issue when opening large PDB like the ribosome (3j77 ) caused by a too large number of different chains. PDB format allows only one character to define the chain name, consequently, the max number of chains is 36 (0-9 and A-Z), however, the ribosome has more than 80 chains.

One solution could be to read this file as CIF, however, at one point I need to convert to PDB to run GENESIS.

Another solution is to use the "segment ID" field of PDB format which is done in VMD to define the segments of the structure, which allows 3 or more characters I believe. This is what I do for the moment with my own PDB handler. however BioPython is based on these chains and could not be changed...

The best for now might be to keep the PDB handler I wrote (I compared my IO parser and the one of BioPython and they are both the same), I implemented some of the function used in AtomicStructHandler for instance the one to align 2 PDBs rigid body wise. It also has several features that are very convenient (for instance matching 2 different structures to find RMSD). I prepared a clean version of the handler named ContinuousFlexPDBHandler which is available in protocols.utilities.pdb_handler.py

— Reply to this email directly, view it on GitHub https://github.com/scipion-em/scipion-em-continuousflex/issues/107#issuecomment-1096326800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK2I5FJU5HJ4C3TSVO3JXNLVEUWEVANCNFSM5SKVVHWQ . You are receiving this because you were assigned.Message ID: @.***>

mms29 commented 2 years ago

I see. Keep mrcfile of course. I will replace the usages of spiderfile3 since I am handling the sampling rate using scipion objects attributes and not using the header. We can open an issue asking Scipion to allow setting the true sampling rate (at least when we pass a flag) since we have a motive: connecting with other packages. On Tue, Apr 12, 2022, 9:52 AM Rémi Vuillemot @.> wrote: Unfortunately, Imagehandler do not save header information when writing to file as it should be (samplign rate and origin axis) and I need these to be in the header for the input of GENESIS. For instance, for the sampling rate when I open a volume with sampling rate 2.0A : file = "/home/guest/Workspace/test.mrc" volume = ImageHandler().read(file) I obtain an Image object with : Sampling rate : X-rate (Angstrom/pixel) = 2 Y-rate (Angstrom/pixel) = 2 Z-rate (Angstrom/pixel) = 2 Then, I save it and open it again : volume.write(file) volume = ImageHandler().read(file) Sampling rate : X-rate (Angstrom/pixel) = 1 Y-rate (Angstrom/pixel) = 1 Z-rate (Angstrom/pixel) = 1 The same thing appends for the origin axis. That is why for the moment I need mrcfile to wright these two parameters in the header. — Reply to this email directly, view it on GitHub <#107 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK2I5FOI467SFKXN4KGA6XDVEUTSTANCNFSM5SKVVHWQ . You are receiving this because you were assigned.Message ID: @.>

It would be good to open an issue for Scipion as it is a major issue (all the image format have this information in header meaning that all packages assume they are present). It might be a problem of Xmipp also since I noticed the same problem with the function "xmipp_image_header" that does not do the job it should (saving sampling rate etc in the header).

MohamadHarastani commented 1 year ago

@mms29 Hi Do you have an idea if this is fixed in Scipion/Xmipp? I recently noticed the use of xmipp_image_header to set sampling rate of an mrc file

mms29 commented 1 year ago

I just tried xmipp_image_header and it looks fixed. I will try to see if saving an image from Scipion take into account the sampling rate and the other header parameters. If so, I'll remove the dependency to mrcfile

MohamadHarastani commented 1 year ago

I asked Pablo and he said that scipion-em already has dependency to mrcfile, so we are okay in this sense.

On Tue, Nov 29, 2022, 11:43 PM Rémi Vuillemot @.***> wrote:

I just tried xmipp_image_header and it looks fixed. I will try to see if saving an image from Scipion take into account the sampling rate and the other header parameters. If so, I'll remove the dependency to mrcfile

— Reply to this email directly, view it on GitHub https://github.com/scipion-em/scipion-em-continuousflex/issues/107#issuecomment-1331413673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK2I5FPA23BG5LQMKUP6TVTWK2BJRANCNFSM5SKVVHWQ . You are receiving this because you were assigned.Message ID: @.***>

mms29 commented 1 year ago

I believe we can continue using mrcfile, especially now that we have our own environment If so we can close this issue