Closed MohamadHarastani closed 1 year ago
Eventually, remove dependency on mrcfile
Maybe replace reading and writing PDBs with the usage of AtomicStructHandler
from pwem.convert.atom_struct import AtomicStructHandler
Unfortunately, Imagehandler do not save header information when writing to file as it should be (samplign rate and origin axis) and I need these to be in the header for the input of GENESIS. For instance, for the sampling rate when I open a volume with sampling rate 2.0A :
file = "/home/guest/Workspace/test.mrc"
volume = ImageHandler().read(file)
I obtain an Image object with :
Sampling rate : X-rate (Angstrom/pixel) = 2 Y-rate (Angstrom/pixel) = 2 Z-rate (Angstrom/pixel) = 2
Then, I save it and open it again :
volume.write(file)
volume = ImageHandler().read(file)
Sampling rate : X-rate (Angstrom/pixel) = 1 Y-rate (Angstrom/pixel) = 1 Z-rate (Angstrom/pixel) = 1
The same thing appends for the origin axis. That is why for the moment I need mrcfile to wright these two parameters in the header.
I see. Keep mrcfile of course. I will replace the usages of spiderfile3 since I am handling the sampling rate using scipion objects attributes and not using the header. We can open an issue asking Scipion to allow setting the true sampling rate (at least when we pass a flag) since we have a motive: connecting with other packages.
On Tue, Apr 12, 2022, 9:52 AM Rémi Vuillemot @.***> wrote:
Unfortunately, Imagehandler do not save header information when writing to file as it should be (samplign rate and origin axis) and I need these to be in the header for the input of GENESIS. For instance, for the sampling rate when I open a volume with sampling rate 2.0A :
file = "/home/guest/Workspace/test.mrc" volume = ImageHandler().read(file)
I obtain an Image object with :
Sampling rate : X-rate (Angstrom/pixel) = 2 Y-rate (Angstrom/pixel) = 2 Z-rate (Angstrom/pixel) = 2
Then, I save it and open it again :
volume.write(file) volume = ImageHandler().read(file)
Sampling rate : X-rate (Angstrom/pixel) = 1 Y-rate (Angstrom/pixel) = 1 Z-rate (Angstrom/pixel) = 1
The same thing appends for the origin axis. That is why for the moment I need mrcfile to wright these two parameters in the header.
— Reply to this email directly, view it on GitHub https://github.com/scipion-em/scipion-em-continuousflex/issues/107#issuecomment-1096290806, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK2I5FOI467SFKXN4KGA6XDVEUTSTANCNFSM5SKVVHWQ . You are receiving this because you were assigned.Message ID: @.***>
Concerning AtomicStructHandler
, I have an issue when opening large PDB like the ribosome (3j77 ) caused by a too large number of different chains. PDB format allows only one character to define the chain name, consequently, the max number of chains is 36 (0-9 and A-Z), however, the ribosome has more than 80 chains.
One solution could be to read this file as CIF, however, at one point I need to convert to PDB to run GENESIS.
Another solution is to use the "segment ID" field of PDB format which is done in VMD to define the segments of the structure, which allows 3 or more characters I believe. This is what I do for the moment with my own PDB handler. however BioPython is based on these chains and could not be changed...
The best for now might be to keep the PDB handler I wrote (I compared my IO parser and the one of BioPython and they are both the same), I implemented some of the function used in AtomicStructHandler
for instance the one to align 2 PDBs rigid body wise. It also has several features that are very convenient (for instance matching 2 different structures to find RMSD). I prepared a clean version of the handler named ContinuousFlexPDBHandler
which is available in protocols.utilities.pdb_handler.py
Perfect
On Tue, Apr 12, 2022, 10:14 AM Rémi Vuillemot @.***> wrote:
Concerning AtomicStructHandler, I have an issue when opening large PDB like the ribosome (3j77 ) caused by a too large number of different chains. PDB format allows only one character to define the chain name, consequently, the max number of chains is 36 (0-9 and A-Z), however, the ribosome has more than 80 chains.
One solution could be to read this file as CIF, however, at one point I need to convert to PDB to run GENESIS.
Another solution is to use the "segment ID" field of PDB format which is done in VMD to define the segments of the structure, which allows 3 or more characters I believe. This is what I do for the moment with my own PDB handler. however BioPython is based on these chains and could not be changed...
The best for now might be to keep the PDB handler I wrote (I compared my IO parser and the one of BioPython and they are both the same), I implemented some of the function used in AtomicStructHandler for instance the one to align 2 PDBs rigid body wise. It also has several features that are very convenient (for instance matching 2 different structures to find RMSD). I prepared a clean version of the handler named ContinuousFlexPDBHandler which is available in protocols.utilities.pdb_handler.py
— Reply to this email directly, view it on GitHub https://github.com/scipion-em/scipion-em-continuousflex/issues/107#issuecomment-1096326800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK2I5FJU5HJ4C3TSVO3JXNLVEUWEVANCNFSM5SKVVHWQ . You are receiving this because you were assigned.Message ID: @.***>
I see. Keep mrcfile of course. I will replace the usages of spiderfile3 since I am handling the sampling rate using scipion objects attributes and not using the header. We can open an issue asking Scipion to allow setting the true sampling rate (at least when we pass a flag) since we have a motive: connecting with other packages. … On Tue, Apr 12, 2022, 9:52 AM Rémi Vuillemot @.> wrote: Unfortunately, Imagehandler do not save header information when writing to file as it should be (samplign rate and origin axis) and I need these to be in the header for the input of GENESIS. For instance, for the sampling rate when I open a volume with sampling rate 2.0A : file = "/home/guest/Workspace/test.mrc" volume = ImageHandler().read(file) I obtain an Image object with : Sampling rate : X-rate (Angstrom/pixel) = 2 Y-rate (Angstrom/pixel) = 2 Z-rate (Angstrom/pixel) = 2 Then, I save it and open it again : volume.write(file) volume = ImageHandler().read(file) Sampling rate : X-rate (Angstrom/pixel) = 1 Y-rate (Angstrom/pixel) = 1 Z-rate (Angstrom/pixel) = 1 The same thing appends for the origin axis. That is why for the moment I need mrcfile to wright these two parameters in the header. — Reply to this email directly, view it on GitHub <#107 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK2I5FOI467SFKXN4KGA6XDVEUTSTANCNFSM5SKVVHWQ . You are receiving this because you were assigned.Message ID: @.>
It would be good to open an issue for Scipion as it is a major issue (all the image format have this information in header meaning that all packages assume they are present). It might be a problem of Xmipp also since I noticed the same problem with the function "xmipp_image_header" that does not do the job it should (saving sampling rate etc in the header).
@mms29 Hi Do you have an idea if this is fixed in Scipion/Xmipp? I recently noticed the use of xmipp_image_header to set sampling rate of an mrc file
I just tried xmipp_image_header and it looks fixed. I will try to see if saving an image from Scipion take into account the sampling rate and the other header parameters. If so, I'll remove the dependency to mrcfile
I asked Pablo and he said that scipion-em already has dependency to mrcfile, so we are okay in this sense.
On Tue, Nov 29, 2022, 11:43 PM Rémi Vuillemot @.***> wrote:
I just tried xmipp_image_header and it looks fixed. I will try to see if saving an image from Scipion take into account the sampling rate and the other header parameters. If so, I'll remove the dependency to mrcfile
— Reply to this email directly, view it on GitHub https://github.com/scipion-em/scipion-em-continuousflex/issues/107#issuecomment-1331413673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK2I5FPA23BG5LQMKUP6TVTWK2BJRANCNFSM5SKVVHWQ . You are receiving this because you were assigned.Message ID: @.***>
I believe we can continue using mrcfile, especially now that we have our own environment If so we can close this issue
It works as follows