scinfu / SwiftSoup

SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)
https://scinfu.github.io/SwiftSoup/
MIT License
4.53k stars 345 forks source link

Nested 'A' tags moves the closing 'A' tag. #121

Closed danramteke closed 4 years ago

danramteke commented 5 years ago

Hello. I sometimes receive problematic user input. I've isolated the problem to nested link tags. Without context, it looks like this:

  let html = """
  <a href="/outer.html">Outer link <a href="/inner.html">innerlink</a> outer link</a>
  """

And when I let document: Document = try SwiftSoup.parse(html); print(document), then I get this.

<html>
 <head></head>
 <body>
  <a href="/outer.html">Outer link </a>
  <a href="/inner.html">innerlink</a> outer link
 </body>
</html>

I would like to be able to get all of the text inside the outer link. However, the </a> tag gets moved forward. How do I get an output that is more like this?

Outer link <a href="/inner.html">innerlink</a> outer link

or, perhaps, like this:

<a href="/outer.html">Outer link innerlink outer link</a>

Thanks so much for this library!

scinfu commented 4 years ago

Tag a: Transparent, but there must be no interactive content descendant. https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-a-element