Open jonathanking opened 1 year ago
We have a function alignTwoSequencesWithBiopython that you could use on your complete sequence and ag.ca.getSequence() the msa and/or indices returned may be helpful for creating what you need.
Thank you! There is a field in the cif file called _pdbx_unobs_or_zero_occ_residues
that records some of this information. Is it possible to access this data via prody? I'm not sure if fields like this are parsed when header=True
, for example.
No, I don’t think it can be parsed with header=True because the function underneath reproduces pdb header parsing and gives an object with a particular structure
however, cif is a type of star format so you should be able to use parseSTAR and navigate the hierarchical dictionary object that you get from that to get there.
If you get stuck, let me know and I’ll see if I can help figure it out
Perhaps, I can add/extend an option of some particular keys to pass
I've now added both a generic option to parse data with any key and a specific one to get an alignment of unobserved residues. These cannot be used from parsePDB with header=True. They have to be used in parseCIFHeader.
Please check #1705 for more details and let me know if this does what you'd like it to. You can access these changes to test it by checking out the associated branch.
If you don't yet have a github version of prody, you can clone it from this branch directly as follows:
git clone -b cif_header https://github.com/jamesmkrieger/ProDy.git ProDy
If you do have it then you can add my fork as a new remote and then check out the remote branch as follows:
git remote add james https://github.com/jamesmkrieger/ProDy.git
git checkout -b cif_header james/cif_header
Hi @jonathanking,
Have you had a chance to try this?
Thanks for your help. Unfortunately, I have not. I proceeded with another direction for the project I was using this for. I think this would be helpful in the future, though!
Best, Jonathan On Aug 7, 2023 at 11:14 AM -0400, James Krieger @.***>, wrote:
Hi @jonathanking, Have you had a chance to try this? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Ok, thanks
I need to be able to parse .cif files so that I have access to the complete protein chain sequence along with annotations for which residues are unobserved or 0 occupancy.
For example, for a protein sequence
WWWGAPGAPGAPWWW
whereGAPGAPGAP
are experimentally unresolved residues, I want to determine the missing sequence mask from this data, e.g.+++---------+++
.Does ProDy have tools to support accessing/constructing this information?
This is what I have so far, where I can parse the .cif file and its header, but I'm unsure how to access missing residue information. Calling
.getOccupancies()
on the atom group is not what I want either since zero occupancy residues seemingly have not been included.cif file: https://files.rcsb.org/view/7E1B.cif
Thanks for your assistance.