lucidsoftware / xtract

A library to make it easy to deserialize XML to user types in scala
Apache License 2.0
60 stars 20 forks source link

How to parse a tree-like XML tags #26

Closed greatbalin closed 3 years ago

greatbalin commented 4 years ago

I have to parse a tree-like hierarchy into corresponding case classes structure (boolean expressions in my case). Here is a little example

<And>
    <Condition1>A</Condition1>
    <Condition2>B</Condition2>
    <Or>
        <Condition3>C</Condition3>
        <Condition4>D</Condition4>
        <And>
            <Condition5>E</Condition5>
            <Condition6>F</Condition6>
        </And>
    </Or>
    <Or>
        <Condition5>EE</Condition5>
        <Condition6>FF</Condition6>
    </Or>
</And>

I have case classes for And, Or and other ConditionX tags.

So, how to write XmlReader for that case?

P.S. I'm using version 2.1.0

tmccombs commented 4 years ago

It's hard to know the best approach without seeing your actual classes.

But assuming you have a sealed trait Condition that all the condition classes extend, you might have something like:

object Condition {
  implicit val xmlReader: XmlReader[Condition] = XmlReader.nodeReader.flatMap { node =>
    node.label match {
      case "And" => And.xmlReader,
      case "Or" => Or.xmlReader,
      case "Condition1" => Condition1.xmlReader,
      ....
    }
  }
}

object And {
  implicit val xmlReader: XmlReader[And] = (__ \ "And").read(XmlReader.seq[Condition]).map(And(_))
}

I haven't tested this at all, and it could proabably be a little more efficient if you pass the children of the node condition's reader instead of the whole node, but that is the general idea.

greatbalin commented 4 years ago

The problem here is that And-reader doesn't want to apply Condition-reader for its' children. So it ends up with just an empty And.

greatbalin commented 4 years ago

It seems like I managed to make it work.

object TreeCondition extends XmlReader[Condition] {
  private def processNode(node: Node): ParseResult[Condition] = {
    node.label match {
      case "And" =>
        ParseResult.combine(
          node.nonEmptyChildren.map(processNode)
        ).map(And.apply)
      case "Or" =>
        ParseResult.combine(
          node.nonEmptyChildren.map(processNode)
        ).map(Or.apply)
      case "Condition1" =>
        Condition1.reader.read(node)
      case _ =>
        ParseFailure()
    }
  }

  override def read(node: NodeSeq): ParseResult[Condition] = XmlReader.nodeReader.read(node).flatMap(processNode)
}

But I don't like that I have to mention "leaf" tag labels in one place. Such a binding is unhandy.

Another question is how to make the tree-traversing abstract and not dependent on particular ADT.

tmccombs commented 4 years ago

But I don't like that I have to mention "leaf" tag labels in one place. Such a binding is unhandy. I suppose another option is you could somehow create a collection of XmlReaders for each condition, and have a reader that tries each of them until it succeeds. You could probably use reflection (possibly compile time in a macro) to generate such a collection. Or if you don't mind having the leaf readers in one place you could do something like:

val reader = And.reader or Or.reader or Condition1.reader or ...

Another question is how to make the tree-traversing abstract and not dependent on particular ADT

I'm not really sure what you mean by this. What would that look like?

greatbalin commented 4 years ago

The "or" is an obvious idea here but there is a problem. It is not possible to define path for them like (__ \ "Condition").read... It must be defined like __.read...

Why is that so: reader will read content of the matching tag but not the current one.

<And>
  <Condition>1</Condition>
</And>

So, if I define reader for "leaf" using tag label in path then it will try to find "leaf" tag inside of the "leaf" tag. In the other words if will expect this:

<And>
  <Condition><Condition>1</Condition></Condition>
</And>

That is why I have to define "leaf" readers using the second variant __.read... construct. So "or" could lead to misread parsing.

greatbalin commented 4 years ago

This is my whole test example and it works fine, but I would appreciate if you could suggest a more concise syntax for that. In real life I have more than 50 different "Conditions". Thanks in advance.

import com.lucidchart.open.xtract._

import scala.xml.{Node, NodeSeq, XML}

sealed trait Condition
object Condition {
  implicit val reader: XmlReader[Condition] = TreeCondition
}
case class And(value: Seq[Condition]) extends Condition
case class Or(value: Seq[Condition]) extends Condition
case class Condition1(value: String) extends Condition
object Condition1 {
  implicit val reader: XmlReader[Condition1] = __.read[String].map(Condition1(_))
}
case class Condition2(value: Int) extends Condition
object Condition2 {
  implicit val reader: XmlReader[Condition2] = __.read[Int].map(Condition2(_))
}
case class Condition3(value: Long) extends Condition
object Condition3 {
  implicit val reader: XmlReader[Condition3] = __.read[Long].map(Condition3(_))
}

object TreeCondition extends XmlReader[Condition] {
  private def processNode(node: Node): ParseResult[Condition] = {
    node.label match {
      case "And" => ParseResult.combine(node.nonEmptyChildren.map(processNode)).map(And(_))
      case "Or" => ParseResult.combine(node.nonEmptyChildren.map(processNode)).map(Or(_))
      case "Condition1" => Condition1.reader.read(node)
      case "Condition2" => Condition2.reader.read(node)
      case "Condition3" => Condition3.reader.read(node)
      case _ =>
        ParseFailure()
    }
  }

  override def read(node: NodeSeq): ParseResult[Condition] = XmlReader.nodeReader.read(node).flatMap(processNode)
}

object Main extends App {
  val str =
    """
      |<And>
      |    <Condition1>A</Condition1>
      |    <Condition2>1</Condition2>
      |    <Or>
      |        <Condition1>C</Condition1>
      |        <Condition2>10</Condition2>
      |        <And>
      |            <Condition1>E</Condition1>
      |            <Condition3>500</Condition3>
      |        </And>
      |    </Or>
      |    <Or>
      |        <Condition1>G</Condition1>
      |        <Condition3>255</Condition3>
      |    </Or>
      |    <And>
      |        <Condition2>15</Condition2>
      |        <Condition3>354</Condition3>
      |    </And>
      |</And>
      |""".stripMargin

  val xml = XML.loadString(str)

  val parsed = XmlReader.of[Condition].read(xml)

  val expected =
    And(Seq(
      Condition1("A"),
      Condition2(1),
      Or(Seq(
        Condition1("C"),
        Condition2(10),
        And(Seq(
          Condition1("E"),
          Condition3(500)
        ))
      )),
      Or(Seq(
        Condition1("G"),
        Condition3(255)
      )),
      And(Seq(
        Condition2(15),
        Condition3(354)
      ))
    ))

  println(parsed)
  println(s"Parsed equal to expected ${parsed.getOrElse(null) == expected}")
}
tmccombs commented 4 years ago

I was able to simply that a little bit, down to:

import scala.xml.{Node, NodeSeq, XML}
import scala.reflect._

sealed trait Condition
object Condition {
  // it's possible to generate this with a macro, if so desired
  implicit val reader: XmlReader[Condition] = And.reader | Or.reader | Condition1.reader | Condition2.reader | Condition3.reader
}

abstract class ConditionParser[T <: Condition : ClassTag, U: XmlReader] {
  protected val name: String = implicitly[ClassTag[T]].runtimeClass.getSimpleName

  def apply(v: U): T
  implicit val reader: XmlReader[T] = XmlReader.label[U](name).map(apply _)
}

abstract class CompoundConditionParser[T <: Condition : ClassTag] extends ConditionParser[T, Seq[Condition]]()(implicitly, __.children.lazyRead(XmlReader.strictReadSeq(Condition.reader)))

case class And(value: Seq[Condition]) extends Condition
object And extends CompoundConditionParser[And]
case class Or(value: Seq[Condition]) extends Condition
object Or extends CompoundConditionParser[Or]
case class Condition1(value: String) extends Condition
object Condition1 extends ConditionParser[Condition1, String]
case class Condition2(value: Int) extends Condition
object Condition2 extends ConditionParser[Condition2, Int]
case class Condition3(value: Long) extends Condition
object Condition3 extends ConditionParser[Condition3, Long]

It still requires listing all of the conditions, but removes a lot of the boilerplate. It is probably possible to use a macro to generate the list of possible conditions, but unless you have a ton of conditions, and the condition change frequently it porbably isn't worth it.

I suppose another option would be to look at the label, then use reflection to try finding a class with that name, then if successfull look at the parameter type of the constructor to extract the contents of the tag. That would remove the need for the parser to know at compile time all the conditions. But adds additional complexity, and probably doesn't perform very well.

tmccombs commented 3 years ago

I'm going to close this since it hasn't been active for a while. If there are remaining questions, I can reopen it.