Closed bienchen closed 2 years ago
mmCIF can do UTF8 but it's discouraged
Really, by whom? Both @brindakv and John W in the past have at least strongly hinted that UTF-8 is the most appropriate encoding for mmCIF (and it is mandated for BinaryCIF). Many (perhaps most) PDB-Dev depositions are not plain ASCII either - they are either UTF8 or latin1/iso-8859-1.
the RCSB validation tool can not [handle UTF-8]
@brindakv, can this be fixed? Is it going to be?
I am happy to merge this but there are other citations (e.g. imp
, hhpred
) that are UTF-8, which have been "working" for some time without issues.
Just recognised this one: https://www.iucr.org/resources/cif/spec/version1.1/semantics#markup So in theory, by CIF standard its all ASCII but there is a markup extension (looks a bit LaTeX-like to me) for all kinds of special letters.
mmCIF can do UTF8 but it's discouraged... basically lots of tools can deal with UTF8 in the mmCIF universe, but the RCSB validation tool can not. Therefore I changed
Žídek A
inihm.citations.alphafold2
toZidek A
, like the name of this author is spelled in other publications.