Open djaglowski opened 1 day ago
Pinging code owners:
pkg/ottl: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley
See Adding Labels via Comments if you do not have permissions to add labels yourself.
A function that can manipulate the xml string in place seems useful. That feels simpler than doing:
- set(cache["xmlMap"], ParseXML(body))
... # manipulate cache[xmlMap"]
- set(body, MarshalXML(cache["xmlMap"]))
I am not experienced enough with XML to propose what kind of functions we'd need for that. Some OTTL guidelines that may be helpful when brainstorming ideas:
ottlfuncs
functions.cache
field that exists as a place for users to store information between statements. It is currently limited by pdata (see https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/26108)set(body, FunctionC(FunctionB(FunctionA(Parser(body))))
isn't great. The work around for this today is to use cache
to store the information between statements. We're aware of this chaining annoyance, but haven't solved it yet.set
as the primary solution for updating telemetry, we have other methods, like merge_maps
and flatten
the work directly on the target field. Editors end up being a good solution if the transformation in question would result in bunch of chaining if Converters were used.Thanks for your thoughts on this @TylerHelmuth. I'm thinking we could mostly rely on Converters here. They would take a target
parameter, which would need to be an xml formatted string. Otherwise parameters would be things like strings which are XPaths, or names of tags to create, etc. The cache could be useful if someone wants to work on a backup of the original value, but I think they could also just incrementally overwrite the target. It might help to add more detail to the above example.
Starting from the same xml document (and assuming this is the body):
set(body, DeleteXML(body, "*//@note"))
takes an XPath parameter and deletes any "note" attributes
<Data foo="bar" hello="world">
Some text
<One>With text</One>
<Two>
<Three>3</Three>
<Four>4</Four>
- <Three note="again">3</Three>
+ <Three>3</Three>
</Two>
</Data>
set(body, ConvertXMLAttributes(body))
converts any remaining attributes into child elements.
- <Data foo="bar" hello="world">
+ <Data>
+ <foo>bar</foo>
+ <hello>world</hello>
Some text
<One>With text</One>
<Two>
<Three>3</Three>
<Four>4</Four>
<Three>3</Three>
</Two>
</Data>
set(body, WrapFloatingXMLValues(body, "value"))
finds instances where values exist at the same level as elements, and wraps them in a tag with the specified name
<Data>
<foo>bar</foo>
<hello>world</hello>
- Some text
+ <value>Some text</value>
<One>With text</One>
<Two>
<Three>3</Three>
<Four>4</Four>
<Three>3</Three>
</Two>
</Data>
Then finally set(body, ParseSimplifiedXML(body))
just converts the simplified (JSON-equivalent) xml string into an attributes map.
If I'm not mistaken, the could compose these inline, but it's not clear to me if there's much benefit to this. Personally I would just use separate statements:
set(body, ParseSimplifiedXML(WrapFloatingXMLValues(ConvertXMLAttributes(DeleteXML(body, "*//@note")), "value")))
Either way, I'm not necessarily proposing the exact Converters in this example, but I think these are pretty close to what we'd need in the short term. Just wanted to articulate better how I imagine the user would incrementally convert their xml into a JSON-equivalent format, and ultimately to a clean attributes map.
Removing needs triage
as a code owner has responded approving the idea.
I've opened https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/35301 with the first concrete direct-xml manipulation converter as described above. If this looks good, I'll add a few more in the coming days and start work on the JSON-equivalent XML parser.
Component(s)
pkg/ottl
Is your feature request related to a problem? Please describe.
XML is frequently used in traditional logging frameworks, but within the collector and downstream tools it is often difficult to manipulate.
Before going further, I believe it would be helpful to define a term: "JSON-equivalent". Basically, a
plog.LogRecord
's body or attributes can be losslessly converted to or from JSON (or YAML, or some other formats).Notably, XML is not JSON-equivalent, at least not generally. However, it is possible to define a subset of XML which is JSON-equivalent, which we could call "JSON-equivalent XML". (More on this below.)
We currently have a
ParseXML
function, but in order to deal with the fact that XML is not generally JSON-equivalent, we are producing an encoding of XML. The encoding is necessarily JSON-equivalent, but ultimately it is an overly verbose representation that OTTL is not well suited to manipulate in ways that respect the encoding. That means that our current strategy for parsing XML has very limited value because users find it difficult to work with in OTTL and at least in some backends.Describe the solution you'd like
In order to better support XML, I believe we should provide the following:
Example
Suppose we have the following XML document:
In order to make this JSON-equivalent, we can't have both attributes and child elements. We also can't have raw values at the same level as child elements. A JSON-equivalent version might look something like this:
This can then be converted directly into a useful object:
In order to accomplish this migration, we need some functionality:
Notably, there is a reasonable amount of subjectivity here. In the example there are two instances of the
Three
tag, but they end up in different formats because of the presence of an attribute on one of them. This may be problematic for the user and there are likely many similar situations. I believe a general solution will require offering a set of composable functions that allow the user to make their own decisions about how to manipulate the representation into a JSON-equivalent format that meets their needs.Describe alternatives you've considered
No response
Additional context
No response