khaeru / sdmx

SDMX information model and client in Python
https://sdmx1.readthedocs.io
Apache License 2.0
26 stars 18 forks source link

Write SDMX-ML (.xml) file with <mes:GenericData> as the root element #196

Open henrihi opened 1 week ago

henrihi commented 1 week ago

Hi,

I am trying to create an SDMX-ML (.xml) file with <mes:GenericData> as the root of the .xml file. However the sdmx.to_xml() function seem to produce an .xml file with <mes:StructureSpecificData> as the root regardless:

<?xml version='1.0' encoding='utf-8'?>
<mes:StructureSpecificData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:com="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common" xmlns:md="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/metadata/generic" xmlns:data="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/structurespecific" xmlns:str="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/structure" xmlns:mes="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:gen="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic" xmlns:footer="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer">
<mes:Header>
<mes:Test>false</mes:Test>
</mes:Header>
<mes:DataSet/>
</mes:StructureSpecificData>

Here is my code:

dataset = DataSet()
test_msg = DataMessage(data=[dataset])
with open(f"data/generic_data.xml", "wb") as f:
    f.write(sdmx.to_xml(test_msg))

Is there any way to produce a .xml file with <mes:GenericData> as the root using the sdmx.to_xml() function? Also, is it possible to change the root attributes? I would also like to remove the xmlns:data="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/structurespecific" attribute.

khaeru commented 5 days ago

Hi there! Indeed, this is currently kind of fixed. It happens because the root rag is chosen by https://github.com/khaeru/sdmx/blob/e88f8d33cedccb003194666564494640e6fa95d4/sdmx/writer/xml.py#L124-L131 and https://github.com/khaeru/sdmx/blob/e88f8d33cedccb003194666564494640e6fa95d4/sdmx/format/xml/common.py#L74-L79

The latter gives defaults used for both SDMX-ML 2.1 and 3.0.0. I guess it should only be the default for SDMX-ML 3.0.0 (where it is the only option given by the standard), and for SDMX-ML 2.1 the code could instead peek ahead and choose <mes:GenericData> if the contained data sets are not structure-specific.

The underlying package lxml writes out all of the XML namespace (xmlns:) attributes that appear on the root tag or any of its children. The set of namespaces and prefixes is determined further up in the same file: https://github.com/khaeru/sdmx/blob/e88f8d33cedccb003194666564494640e6fa95d4/sdmx/writer/xml.py#L22

So this could also be adapted to (a) differ for SDMX-ML 2.1 and 3.0 and (b) only include …/{meta}data/structurespecific or …/{meta}data/generic, as appropriate.

A PR would be welcome. As a workaround you could use lxml directly to manipulate the returned XML or file.