python / cpython

The Python programming language
https://www.python.org
Other
62.3k stars 29.93k forks source link

xml.etree.ElementTree.write does not support `standalone` option #89499

Open 905fe061-a668-4f2d-9dac-96c8003c302e opened 2 years ago

905fe061-a668-4f2d-9dac-96c8003c302e commented 2 years ago
BPO 45336
Nosy @scoder, @Fidget-Spinner, @akulakov, @ericvergnaud

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.7', 'type-bug', 'library'] title = 'xml.etree.ElementTree.write does not support `standalone` option' updated_at = user = 'https://bugs.python.org/twowolfs' ``` bugs.python.org fields: ```python activity = actor = 'ned.deily' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'twowolfs' dependencies = [] files = [] hgrepos = [] issue_num = 45336 keywords = [] message_count = 9.0 messages = ['402981', '404251', '404286', '404295', '404297', '404304', '415198', '415199', '415204'] nosy_count = 7.0 nosy_names = ['scoder', 'eli.bendersky', 'docs@python', 'kj', 'andrei.avk', 'twowolfs', 'ericvergnaud'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue45336' versions = ['Python 3.7'] ```

905fe061-a668-4f2d-9dac-96c8003c302e commented 2 years ago

When executing the following command after modifiy an xml file an error is prodcued.

import xml.etree.ElementTree as ET
rtexmlFile = 'Fox_CM3550A_SWP1_Rte_ecuc.arxml'
rte_ecu_tree = ET.parse(rtexmlFile)
root = rte_ecu_tree.getroot()

rte_ecu_tree.write(rtexmlFile, encoding="UTF-8", xml_declaration="True", default_namespace="None" method="xml",short_empty_elements="True" )

ValueError: cannot use non-qualified names with default_namespace option

The documentation for the ElementTree.write function indicates the following format for this command but this format does not seem to wrok

The write command does not also take into account when having standalone in the xml defintion. For ex,

\<?xml version="1.0" encoding="UTF-8" standalone="no"?>

ElementTree.write will not add standalone back to the xml file

Is this a bug in version 3.7?

write(file, encoding="us-ascii", xml_declaration=None, default_namespace=None, method="xml", *, short_empty_elements=True)
Writes the element tree to a file, as XML. file is a file name, or a file object opened for writing. encoding 1 is the output encoding (default is US-ASCII). xml_declaration controls if an XML declaration should be added to the file. Use False for never, True for always, None for only if not US-ASCII or UTF-8 or Unicode (default is None). default_namespace sets the default XML namespace (for “xmlns”). method is either "xml", "html" or "text" (default is "xml"). The keyword-only short_empty_elements parameter controls the formatting of elements that contain no content. If True (the default), they are emitted as a single self-closed tag, otherwise they are emitted as a pair of start/end tags.

The output is either a string (str) or binary (bytes). This is controlled by the encoding argument. If encoding is "unicode", the output is a string; otherwise, it’s binary. Note that this may conflict with the type of file if it’s an open file object; make sure you do not try to write a string to a binary stream and vice versa.

akulakov commented 2 years ago

Ed: something looks a bit odd with the call to write() that you used:

rte_ecu_tree.write(rtexmlFile, encoding="UTF-8", xml_declaration="True", default_namespace="None" method="xml",short_empty_elements="True" )

Note that default val for default_namespace is None, but you have it quoted, and you also have quoted xml_declaration and short_empty_elements. Try re-running with these arg values unquoted.

905fe061-a668-4f2d-9dac-96c8003c302e commented 2 years ago

Hi Andrew

I removed the quotes and still see an issue with the standalone not being added to the xml declaration. I set the command as follows

rte_ecu_tree.write(rtexmlFile, encoding="UTF-8", xml_declaration=True, default_namespace=None, method="xml",short_empty_elements=True )

The xml declaration came out as \<?xml version='1.0' encoding='UTF-8'?>

but should be \<?xml version="1.0" encoding="UTF-8" standalone="no"?>

standalone did not get added to the declaration.

akulakov commented 2 years ago

Ed: it seems ElementTree.write does not support standalone option, but both minidom (https://docs.python.org/3/library/xml.dom.minidom.html) and lxml (https://lxml.de/) support it. Are either of those suitable for your usecase?

905fe061-a668-4f2d-9dac-96c8003c302e commented 2 years ago

Will ElementTree.write be updated to correct this issue?

akulakov commented 2 years ago

Ed: I can look into adding it, but not sure when. If you can make the case that minidom and lxml are not suitable workarounds for this, it will be more likely that me or someone else will add this option; but note that as this is a new feature, it will only go into Python 3.11 .

3f8ed3fe-163b-49b6-9565-c07b0c68c165 commented 2 years ago

This is not a feature request, it's a bug fix request, so should be fixed asap.

Why is it a bug ? XML spec says that "the default namespace does not apply to attribute names" (see section 6.3), therefore having a simple attribute name when using a default namespace is a perfectly valid scenario.

Raising a ValueError in add_qname (line 864) is therefore incorrect if the qname being added is a simple name of an attribute

not sure if lxml is able to parse very large documents (>4g) but I'll try it

3f8ed3fe-163b-49b6-9565-c07b0c68c165 commented 2 years ago

lxml tostring does not support the default_namespace value so not an option

3f8ed3fe-163b-49b6-9565-c07b0c68c165 commented 2 years ago

Actually there are 2 distinct issues here:

BourgoisMickael commented 1 year ago

any news for the standalone attribute ? It shouldn't be too hard to add

Umaritimus123 commented 7 months ago

why not support 'standalone'?