python / cpython

The Python programming language
https://www.python.org
Other
63.42k stars 30.37k forks source link

Let ElementTree prolog include comments and processing instructions #68475

Open rhettinger opened 9 years ago

rhettinger commented 9 years ago
BPO 24287
Nosy @rhettinger, @scoder, @vadmium
Superseder
  • bpo-9521: xml.etree.ElementTree skips processing instructions when parsing
  • Files
  • xml_prolog.diff: Very rough draft patch.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['expert-XML', 'type-feature'] title = 'Let ElementTree prolog include comments and processing instructions' updated_at = user = 'https://github.com/rhettinger' ``` bugs.python.org fields: ```python activity = actor = 'scoder' assignee = 'none' closed = False closed_date = None closer = None components = ['XML'] creation = creator = 'rhettinger' dependencies = [] files = ['39498'] hgrepos = [] issue_num = 24287 keywords = ['patch'] message_count = 4.0 messages = ['244069', '244072', '244085', '340993'] nosy_count = 4.0 nosy_names = ['rhettinger', 'scoder', 'eli.bendersky', 'martin.panter'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = '9521' type = 'enhancement' url = 'https://bugs.python.org/issue24287' versions = ['Python 3.6'] ```

    rhettinger commented 9 years ago

    Currently, ElementTree doesn't support comments and processing instructions in the prolog. That is the typical place to put style-sheets and document type definitions.

    It would be used like this:

        from xml.etree.ElementTree import ElementTree, Element, Comment, ProcessingInstruction
    
        r = Element('current_observation', version='1.0')
        r.text = 'Nothing to see here.  Move along.'
        t = ElementTree(r)
        t.append(ProcessingInstruction('xml-stylesheet', 'href="latest_ob.xsl" type="text/xsl"'))
        t.append(Comment('Published at: http://w1.weather.gov/xml/current_obs/KSJC.xml'))

    That creates output like this:

    <?xml version='1.0' encoding='utf-8'?>
    <?xml-stylesheet href="latest_ob.xsl" type="text/xsl"?>
    <!--Published at: http://w1.weather.gov/xml/current_obs/KSJC.xml-->
    <current_observation version="1.0">
    Nothing to see here.  Move along.
    </current_observation>
    vadmium commented 9 years ago

    The ElementTree class imitates or wraps many methods of the Element class. Since Element.append() and remove() already exist and act on children of the element, I think the new ElementTree methods should be named differently. Maybe something like prolog_append() and prolog_remove()? Or prologue_append() depending on your spelling preferences :P

    Also, maybe the new write() calls should add newlines.

    scoder commented 9 years ago

    FTR, lxml's Element class has addnext() and addprevious() methods which are commonly used for this purpose. But ET can't adopt those due to its different tree model.

    I second Martin's comment that ET.append() is a misleading name. It suggests adding stuff to the end, whereas things are actually being inserted before the root element here.

    I do agree, however, that this is a helpful feature and that the ElementTree class is a good place to expose it. I propose to provide a "prolog" (that's the spec's spelling) property holding a list that users can fill and modify any way they wish. The serialiser would then validate that all content is proper XML prologue content, and would serialise it in order.

    My guess is that lxml would eventually use a MutableSequence here that maps changes directly to the underlying tree (and would thus validate them during modification), but ET can be more lenient, just like it allows arbitrary objects in the text and tail properties which only the serialiser rejects.

    Note that newlines can easily be generated on user side by setting the tail of a PI/Comment to "\n". (The serialiser must also validate that the tail text is only allowed whitespace.)

    For reference:

    http://www.w3.org/TR/REC-xml/#sec-prolog-dtd

    scoder commented 5 years ago

    This is a duplicate of 9521, but it's difficult to say which ticket is better.