julie-forman-kay-lab / IDPConformerGenerator

Build conformational representations of Intrinsically Disordered Proteins and Regions by a guided sampling of the protein torsion space
https://idpconformergenerator.readthedocs.io/
Apache License 2.0
19 stars 6 forks source link

New 5 chars PDB IDs #250

Open joaomcteixeira opened 1 year ago

joaomcteixeira commented 1 year ago

This affects us:

https://www.wwpdb.org/news/news?year=2023#63ff72ccc031758bf1c30ff7

menoliu commented 1 year ago

I thought the CCD was different from PDB ID? However I read that PDB IDs will still be affected in the future: https://www.wwpdb.org/news/news?year=2021#607760112786e73a79c76f9d

Are we thinking of making updates to libpdb? We would need an example of a culled list of 5 char PDB IDs to test out pdbdl and our Structure class. Thoughts?

menoliu commented 1 year ago
PRD_999999 1 1  THR THR N Y 
PRD_999999 1 2  X2AVD VAL N Y 
PRD_999999 1 3  PRO PRO N Y 
PRD_999999 1 4  SAR GLY N Y 
PRD_999999 1 5  MVA VAL N Y 
PRD_999999 1 6  PXZ ?   N Y 
PRD_999999 1 7  THR THR N Y 
PRD_999999 1 8  X2AVD VAL N Y 
PRD_999999 1 9  PRO PRO N Y 
PRD_999999 1 10 SAR GLY N Y 
PRD_999999 1 11 MVA VAL N Y 

Aha, I see in this example of a 5 character CCD of what was DVA now it's X2AVD. I was looking at our libstructure and libcif on line 333 and currently we organize our columns for mmCIF using _atom_site.XYZ (for models in wwPDB/extended-wwPDB-identifier-examples/tree/main so it should still be okay?

menoliu commented 11 months ago

Yes I did some testing and we still recognise these residues and process these new mmCIF files no problem, just having trouble converting them to their one letter code constituents... but then again, I was thinking for phosphorylation PTM at least we'll have the lower-case sequence (e.g. phospho-serine could be s instead of S). Since phosphorylation is a very common PTM and I need to have it as a 1 letter code to perform sequence electrostatic potential analysis (in my new functions 😉)

Edit: the lower case idea isn't very good, I'm considering on just making different flags where the user specifies what residues are phosphorylated (and other PTMs as we move forward)

formankay commented 11 months ago

Good to hear. Good luck coming up with 1-letter PTM codes. Happy to brainstorm if you want.... Thanks. Julie


From: Zi Hao (Nemo) Liu @.> Sent: December 14, 2023 2:55 PM To: julie-forman-kay-lab/IDPConformerGenerator @.> Cc: Subscribed @.***> Subject: Re: [julie-forman-kay-lab/IDPConformerGenerator] New 5 chars PDB IDs (Issue #250)

Yes I did some testing and we still recognise these residues and process these new mmCIF files no problem, just having trouble converting them to their one letter code constituents... but then again, I was thinking for phosphorylation PTM at least we'll have the lower-case sequence (e.g. phospho-serine could be s instead of S). Since phosphorylation is a very common PTM and I need to have it as a 1 letter code to perform sequence electrostatic potential analysis (in my new functions 😉)

— Reply to this email directly, view it on GitHubhttps://github.com/julie-forman-kay-lab/IDPConformerGenerator/issues/250#issuecomment-1856491495, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMXWN4GIRIY3LUYJABQ4JFDYJNKRJAVCNFSM6AAAAAA46BIK7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJWGQ4TCNBZGU. You are receiving this because you are subscribed to this thread.


This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.

joaomcteixeira commented 11 months ago

Yes I did some testing and we still recognise these residues and process these new mmCIF files no problem,

Yes, the parser looks for spaces.

For the new pdb_00001abc codes we likely need to update the code below so that IDPCG recognises these new codes in the culled list or in general.

https://github.com/julie-forman-kay-lab/IDPConformerGenerator/blob/2a0e8747880aab815d74f7714d8cef15f0eafe82/src/idpconfgen/libs/libpdb.py#L248-L252

Another point of possible error is the DSSP calculation. I don't remember how the PDBIDs are handled there.

To-do list: