Closed samirelanduk closed 3 years ago
Hi Sam,
I'm not the expert here, but why do you want to know the order? It is not important programs consuming mmCIF. The only reason I see you want this order is to use a text editor to visually compare two mmCIF files.
If that is what you're looking for, you might want to have a look at the program cif-diff, part of our cif-tools (here in github at PDB-REDO/cif-tools). It reads in two mmCIF files, orders the second in the same order as the first and then compares the content. This can even be done with vimdiff (if you specify --text).
Sorry for the advertisement.
regards, -maarten
Op 29-01-2021 om 20:17 schreef Sam Ireland:
The categories/tables seem to come in a set order in mmCIF file (I am referring specifically to those in the Protein Data Bank here). |_entry| is first, then |_audit_conform|, etc. Is the actual order of all 200 or so given anywhere? In all the documentation I can find, they are just given in alphabetical order.
I have tried to work this out from the PDB itself, but it's not straightforward as no file contains all categories. In fact I think it is impossible to do it this way because the order is not consistent between versions. In 1twj for example (dict version 5.281), |pdbx_struct_assembly| comes before |pdbx_nonpoly_scheme| but in 6qha (dict version 5.305) it is the other way around.
If I want to know the order of categories for a specific dict version (the most recent one, say), where can I get that information? Producing mmCIF files that match PDB ones is difficult to do without this information.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/wwpdb-dictionaries/mmcif_pdbx/issues/35, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADNA47AL7BMXSPEN2KOWQRLS4MCU3ANCNFSM4WZLKR2Q.
-- Maarten L. Hekkelman http://www.hekkelman.com/
I need to produce files that match the public PDB files as closely as possible.
There is an order - for many of the categories, but not all of them. We tend to put key categories early, and larger one later - but that is to aid manual viewing of the file. A CIF parser has to read the entire file. It is not clear why you need a file that matches public PDB files.
Here is the list of categories listed earlier in the file - it comes from the cifexch2 program in the dictionary software.
"entry", "audit", "audit_conform", "database", "database_2",
"database_PDB_rev", "database_PDB_rev_record",
"pdbx_database_PDB_obs_spr", "pdbx_database_related",
"pdbx_database_status", "pdbx_database_proc", "audit_contact_author",
"audit_author", "citation", "citation_author", "citation_editor", "cell",
"symmetry", "entity", "entity_keywords", "entity_name_com",
"entity_name_sys", "entity_poly", "entity_poly_seq", "entity_src_gen",
"entity_src_nat", "pdbx_entity_src_syn", "entity_link", "struct_ref",
"struct_ref_seq", "struct_ref_seq_dif", "chem_comp", "pdbx_nmr_exptl",
"pdbx_nmr_exptl_sample_conditions", "pdbx_nmr_sample_details",
"pdbx_nmr_spectrometer", "pdbx_nmr_refine", "pdbx_nmr_details",
"pdbx_nmr_ensemble", "pdbx_nmr_representative", "pdbx_nmr_software",
"exptl", "exptl_crystal", "exptl_crystal_grow",
"exptl_crystal_grow_comp", "diffrn", "diffrn_detector",
"diffrn_radiation", "diffrn_radiation_wavelength", "diffrn_source",
"reflns", "reflns_shell", "computing", "refine", "refine_analyze",
"refine_hist", "refine_ls_restr", "refine_ls_restr_ncs",
"refine_ls_shell", "pdbx_refine", "pdbx_xplor_file", "struct_ncs_oper",
"struct_ncs_dom", "struct_ncs_dom_lim", "struct_ncs_ens",
"struct_ncs_ens_gen", "struct", "struct_keywords", "struct_asym",
"struct_biol", "struct_biol_gen", "struct_biol_view", "struct_conf",
"struct_conf_type", "struct_conn", "struct_conn_type",
"struct_mon_prot_cis", "struct_sheet", "struct_sheet_order",
"struct_sheet_range", "struct_sheet_hbond", "pdbx_struct_sheet_hbond",
"struct_site", "struct_site_gen", "database_PDB_matrix", "atom_sites",
"atom_sites_alt", "atom_sites_footnote", "atom_type", "atom_site",
"atom_site_anisotrop", "database_PDB_caveat", "database_PDB_remark",
"pdbx_poly_seq_scheme", ""
Thanks!
The categories/tables seem to come in a set order in mmCIF files (I am referring specifically to those in the Protein Data Bank here).
_entry
is first, then_audit_conform
, etc. Is the actual order of all 200 or so given anywhere? In all the documentation I can find, they are just given in alphabetical order.I have tried to work this out from the PDB itself, but it's not straightforward as no file contains all categories. In fact I think it is impossible to do it this way because the order is not consistent between versions. In 1twj for example (dict version 5.281),
pdbx_struct_assembly
comes beforepdbx_nonpoly_scheme
but in 6qha (dict version 5.305) it is the other way around.If I want to know the order of categories for a specific dict version (the most recent one, say), where can I get that information? Producing mmCIF files that match PDB ones is difficult to do without this information.