Open daniel-s-d-larsson opened 1 month ago
If you mean the lack of TER in mmCIF, it's fine, only PDB files have TER records.
The software used on the PDB servers discards and regenerates some information; it may easily happen that correct annotation is replaced with incorrect one. To investigate it, we'd need an example file that demonstrates the problem.
How can I send you the file if I don't want to post it here? Maybe I could remove the coordinate columns...
You could use send it by email (wojdyr@gmail.com). Editing the file requires some work on your side, but would also be fine – perhaps it'd suffice to include only a few residues in each chain.
Please also send it to me, or let Marcin share it with me. Did the PDB staff explain what was wrong?
Ok, I will send you the problematic file. I will also ask the wwPDB staff to describe the problem in detail.
Now I have tried uploading different modified versions of the mmcif file to the deposition and validation servers, including running the files through the PDB extract server, and I cannot figure out exactly what is causing the problem. For the time being, I cannot waste more time on this issue, but everything points at it being the PDB servers reading the poly.type records incorrectly or mapping to the chains incorrectly. My workaround is to delete the section entirely before uploading to a fresh deposition session.
Wouldn't it be easier to just send us the file? (Never mind, Keitaro reproduced it using 7k00)
It seems to be a bug in maxit. I compiled v11.200 and reduced 7k00 to ~500 lines input.mmcif.gz to reproduce it.
The input file has:
loop_
_entity.id
_entity.type
A polymer
B polymer
loop_
_entity_poly.entity_id
_entity_poly.type
A polyribonucleotide
B polypeptide(L)
Running:
maxit -input input.mmcif -output output.cif -o 8
produces output.cif with:
loop_
_entity_poly.entity_id
_entity_poly.type
_entity_poly.nstd_linkage
_entity_poly.nstd_monomer
_entity_poly.pdbx_seq_one_letter_code
_entity_poly.pdbx_seq_one_letter_code_can
_entity_poly.pdbx_strand_id
_entity_poly.pdbx_target_identifier
1 polyribonucleotide no no AAUUGAAGA AAUUGAAGA A ?
2 polyribonucleotide no no VSMRDMLKAGV VSMRDMLKAGV B ?
#
loop_
_entity_poly_seq.entity_id
_entity_poly_seq.num
_entity_poly_seq.mon_id
_entity_poly_seq.hetero
1 1 A n
1 2 A n
1 3 U n
1 4 U n
1 5 G n
1 6 A n
1 7 A n
1 8 G n
1 9 A n
2 1 VAL n
2 2 SER n
2 3 MET n
2 4 ARG n
2 5 ASP n
2 6 MET n
2 7 LEU n
2 8 LYS n
2 9 ALA n
2 10 GLY n
2 11 VAL n
#
loop_
_entity.id
_entity.type
_entity.src_method
_entity.pdbx_description
_entity.formula_weight
_entity.pdbx_number_of_molecules
_entity.pdbx_ec
_entity.pdbx_mutation
_entity.pdbx_fragment
_entity.details
1 polymer man
;RNA (5'-R(P*AP*AP*UP*UP*GP*AP*AP*GP*A)-3')
;
2903.815 1 ? ? ? ?
2 polymer man VAL-SER-MET-ARG-ASP-MET-LEU-LYS-ALA-GLY-VAL 1208.495 1 ? ? ? ?
#
All looks fine apart from:
1 polyribonucleotide no no AAUUGAAGA AAUUGAAGA A ?
2 polyribonucleotide no no VSMRDMLKAGV VSMRDMLKAGV B ?
If I change the order of lines in the input to:
loop_
_entity_poly.entity_id
_entity_poly.type
B polypeptide(L)
A polyribonucleotide
then in the output I get:
1 "polypeptide(L)" no no AAUUGAAGA AAUUGAAGA A ?
2 "polypeptide(L)" no no VSMRDMLKAGV VSMRDMLKAGV B ?
If _entity_poly.type
is absent in the input file, it's correct in the output.
Good that you found the culprit. For the time being, I will just delete the poly.type section before I upload files to wwPDB.
Just noticed maxit-v11.300 has been released https://sw-tools.rcsb.org/apps/MAXIT/source.html and it worked properly. Also tested https://validate-rcsb-1.wwpdb.org/, which seemed to still use an older maxit version.
I have a problem that the wwPDB deposition server and also the wwPDB validation server misidentifies my polymer chains as polyribonucleotide instead of polypeptide(L) when I upload the refined.mmcif file from refine_spa_norefmac. The poly.type is set correctly in the header, but the wwPDB staff says it is wrong on their side.
Example:
This is a ribosome structure with both ribonucleic acids and many proteins. Strangely, only protein chains up til a specific point are identified as polyribonucleotide, which indicate to me that there is some corruption in the file. Could the lack of TER records cause this problem? In an earlier deposition, which used an older version of refine_spa, the polymer type records were not there.