Closed drzraf closed 6 years ago
@pbiron , for 1.3? (as far as I understand, this repository seems abandonned by HumanMade)
@drzraf I wouldn't call it abandoned, but it's not a focus of ours right now. @swissspidy and @hyperpress also have push permissions though. :)
There are already dozens of PR pending for almost a year and discussion about them didn't even started.
That's the guarantee for a future full of (avoidable) merge-conflicts that no one will want to resolve.
I'm trying my best to create feature-branch all based on master
for my own PR but as pending branches start conflicting themselves without any hint about mergeability future I'd not keep up.
I don't have a strong opinion on this, but it sounds like an extension of the WXR format to me (which doesn't really have a spec, but still) that isn't really needed.
What's the real benefit of this? Just the reduced size?
Are there any exporters that support this format already? If I'd write a custom exporter I would probably not implement this format because it would be too much of a hassle to determine whether I can use the short format or not.
Not only is this "not needed" (as @swissspidy said), but could actually be bad if those writing custom exporters are not aware of some of the finer points of XML processing.
XML attribute values are subject to Attribute Value Normalization (XML Spec, Section 3.3.3). Among other things, the normalization algorithm specifies that all literal white space characters in attribute values be replaced by space characters. To quote the XML Spec:
Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed, the white space character is replaced with a space character (#x20) in the normalized value.
For example, to a conforming XML processor
<element attr='#x9 #x20 this is a test' />
(where #x9
represents a literal tab character, and #x20
represents a literal space character )
is equivalent to
<element attr='#x20 #x20 this is a test' />
I'd say it's a pretty good bet that many (most?) of those writing custom exporters are unaware of this and that if told they could export meta values as attributes they would do so in cases that would result in data loss/alteration on import.
Element values are not subject to this normalization. Hence, if the data may contain non-space white space characters it is always safer to represent it as element content and not in an attribute value.
And while PHP's XMLReader is not 100% conforming, it is with regard to Attribute Value Normalization.
So, I do not think this should be merged!
p.s. sorry for not responding to this earlier! The ping back in April escaped me.
Sounds like a wontfix then :-)
I'm closing this for now.
<wp:postmeta>
is usually exported this way:(156 bytes)
CDATA are used for the meta_value, so that HTML entities would not be interpreted by the XML parser.
This commit makes the parser aware of a second short format where post's meta are expressed as attributes:
<wp:postmeta meta_key="_wp_page_template" meta_value="some-string.tpl.php" />
(85 bytes)meta_value
length > X [XML does not set any limit on this btw])ToDo: explore possible future use of
XMLReader::SUBST_ENTITIES
to completely avoid cluttering XML with CDATA?