note / xml-lens

XML Optics library for Scala
https://note.github.io/xml-lens/
MIT License
32 stars 5 forks source link

Normalization of XML #5

Open note opened 7 years ago

note commented 7 years ago

At some point we will want to have reasonable output. Outside of pure formatting aspect it would be nice to e.g. try to avoid multiple namespace declarations for the same namespaces. Probably all namespace declarations should be moved to root element.

Such operations should be optional - there may be some cases when user want to avoid unneccessary transformations as want to have output as much similar to input as it's possible.

There's an example of such behavior (namely - many namespace declarations for one namespace) in test replaceOrAddAttr for ResolvedNameMatcher in OpticsBuilderSpec

note commented 6 years ago

After some thought - I think normalization will be actually more useful for being sure that declarations of used namespaces in fact exist in XML document. Also, as normalization may be a costly operation so it may a good idea to do it when parsing.

Idea sketch:

Add def parseNormalized: Either[FailType, (XmlDocument, Set[Namespace]). Besides of returning set of defined namespaces (which is reflected on return type) it would also move all namespace declarations to root element. Then, if user is not interested in adding new namespace and just working on already defined he can use Set[Namespace] returned by mentioned method for e.g. creating new elements.

We can have a symmetrical print method e.g. def printNormalized(doc: XmlDocument, namespaces: Set[Namespace]) which would print all namespaces in root element.

The problem with that idea is that it still relies on assumption that user uses just namespaces returned by parseNormalized. The API itself would not restrict him to e.g. add some element with completely different (and potentially not declared) namespace.

Another idea would be to use path dependant types to restrict user to use just declared namespaces. The disadvantage may be that AST itself would probably need to carry that info on typelevel. Reasonable solution would be to allow for arbitrary namespaces usages on AST and optics level and do the whole normalization thing (more precisely - restricting usages of namespaces just to declared ones) on DSL level.