iabudiab / HTMLKit

An Objective-C framework for your everyday HTML needs.
MIT License
239 stars 27 forks source link

How to find and highlight desired text using DOMRanges #9

Closed matcodes closed 7 years ago

matcodes commented 7 years ago

Hi I am implementing text highlight feature for my UIWebView page, and I wanted to use your library to find all DOM ranges for the desired text and then use rangy library to highlight them. Could you assist me?

iabudiab commented 7 years ago

@matcodes Hey there, could you please give me more details about what you are trying to achieve. A concrete example maybe?

matcodes commented 7 years ago

@iabudiab Let's say I have an html string <html><body><div><h1>HTMLKit</h1><p id='foo'><h2>Hello</h2>there! Hello world!</p></div></body></html>

User starts searching for the word hello case-insensitively - my html string need to be transformed into: <html><body><div><h1>HTMLKit</h1><p id='foo'><h2><span class="highlight">Hello</span></h2>there! <span class="highlight">Hello</span> world!</p></div></body></html>

Since this string is loaded and being displayed in UIWebView, I need to modify the html in realtime. rangy library has a method that wraps text with span if I provide the DOMRange as a parameter. So I wanted with your kit find ranges and then evaluate a javascript function in uiwebview. Please, let me know if you need more details!

Edit:

So, another words I need to find start container/offset and end container/offset which contains the target text to construct the range. My little code snippet I started with:

    // initialize array which will hold the ranges need to be wrapped
    var ranges = [HTMLRange]() 

    let document = HTMLDocument.init(string: "<html><body><div><h1>HTMLKit</h1><p id='foo'><h2>Hello</h2>there! Hello world!</p></div></body></html>") //
    for element in (document.body?.nodeIterator())! {
     print((element as? HTMLElement)?.textContent ?? "unknown")
    }

This code snippet prints and by the way I do not know why, textContent is nil for some of the nodes:

HTMLKitHellothere! Hello world!
HTMLKitHellothere! Hello world!
HTMLKit
unknown

Hello
unknown
unknown
iabudiab commented 7 years ago

@matcodes Ok, now I understand what you're trying to do. I am not sure if using HTMLKit, or any other library that lives outside the UIWebView for that matter, is the best way to achieve this. More on this later.

Nil Values

First let me explain why you're getting nil-textContent for some nodes.

In your snippet you are using a nodeIterator, which traverses the DOM tree starting at the body element all the way down. The nodeIterator visits all the DOM nodes, including HTMLText and HTMLComment nodes, if not told otherwise.

for element in (document.body?.nodeIterator())! {
     print((element as? HTMLElement)?.textContent ?? "unknown")
}

Thus, when the nodeIterator visits a HTMLText node, as it is the case for the text inside of the <h1>, the following optional cast (element as? HTMLElement) fails and you get a nil value.

You could use the nodeIterator like this to get only HTMLText nodes and their contents:

// showOptions tells the iterator which node types you are interested in
for node in body.nodeIterator(showOptions: .text, filter: nil) {
    guard let textNode = node as? HTMLText else { continue }
    let textContent = textNode.textContent
    print(textContent)
}

Ranges

When you load a web page of HTML string inside a UIWebView or WKWebView the web view parses it and builds an internal DOM tree, which is not directly accessible from the native code.

The only sane way to interact with it is via JavaScript that can be passed to the web view. This is unfortunately quite limited. You cannot for example inject JavaScript that references native objects say a HTMLRange instance, which is why even if you could construct the ranges with HTMLKit this wouldn't help you much. I'm not saying that you cannot hack your way through it, but I would not recommend it in any way.

Here is a code snippet to construct HTMLRange instances for a given search string:

let searchString = "Hello"
var ranges: [HTMLRange] = []
// iterate over all text nodes
for node in body.nodeIterator(showOptions: .text, filter: nil) {
   guard let textNode = node as? HTMLText else { continue }

   // get text context
   let textContent: NSString = textNode.textContent as NSString
   // check if search string is contained in the textContent
   let textRange = textContent.range(of: searchString, options: .caseInsensitive)
   guard textRange.location != NSNotFound else { continue }

   // construct a range instance for the occurrence of the searchString
   let range = HTMLRange(
      document: document,
      startContainer: textNode,
      startOffset: UInt(textRange.location),
      endContainer: textNode,
      endOffset: UInt(textRange.location + textRange.length)
   )
   // append it to the list
   ranges.append(range)
}

An alternative to using rangy would be to use HTMLKit to manipulate the HTML and reload it. Now that you have a list of HTMLRange instances you can wrap all ranges' contents with a <span> like this:

// iterate
ranges.forEach { range in
    // create new span
    let span = HTMLElement(tagName: "span", attributes: ["class": "highlight"])
    // surround the range contents with the new element
    range.surroundContents(span)
}

// reload the page in the web view
webView.loadHTMLString(document.innerHTML, baseURL: nil)

The biggest disadvantage of this approach is the reload part, since you do not want that.

For a better alternative check out this tutorial. It has a pure JavaScript solution to your problem.

Hope this helps. Do not hesitate to ask any further questions if you have any. Will be glad to help 😉

iabudiab commented 7 years ago

@matcodes I'll close this now. Feel free to reopen if you have any further questions.

matcodes commented 7 years ago

@iabudiab Thanks for your great answer. I have already working pure js solution, but the problem that if I have huge html, it freezes UI for a while. Now I am doing some optimization.

I come up with solution to create html with every element having unique id. Once I assigned ids to HTML in javascript and load that string into UIWebView, I initialize your HTMLDocument in background thread and starting to find the keyword with the snippet you provided.

Once I found ranges, I want to wrap the range with tags as you did. And for final, to run javascript in UIWebView and modify the innerHTML of element which contains the keyword range I found?

Can you advice me how do I extract what is "id" attribute HTMLText has?

      for node in body.nodeIterator(showOptions: .text, filter: nil) {
        guard let textNode = node as? HTMLText else { continue }

        // get text context
        let textContent: NSString = textNode.textContent as NSString
        // check if search string is contained in the textContent
        let textRange = textContent.range(of: searchString, options: .caseInsensitive)
        guard textRange.location != NSNotFound else { continue }

        // construct a range instance for the occurrence of the searchString
        let range = HTMLRange(
          dowcument: (self.htmlParser?.document)!,
          startContainer: textNode,
          startOffset: UInt(textRange.location),
          endContainer: textNode,
          endOffset: UInt(textRange.location + textRange.length)
        )
        // append it to the list
        ranges.append(range)
      }

        ranges.forEach { range in
       //HOW TO GET THE ID from html text???
        let id = range.startContainer
        // create new span
        let span = HTMLElement(tagName: "span", attributes: ["class": "highlight"])
        // surround the range contents with the new element
        range.surroundContents(span)

        let innerHtml = range.startContainer.innerHTML

        DispatchQueue.main.sync {
          self.webView?.js("document.getElementById(\(???id???)).innerHTML = \(innerHtml)")
        }
iabudiab commented 7 years ago

@matcodes Hey there, not all HTML DOM nodes have ids, in fact only HTMLElements can have ids, since this is the only node type that possesses attributes. All other nodes have no attributes and thus cannot be assigned an id.

However, you can change the JS to operate on the direct parent of the HTMLText. I have modified your snippet accordingly. I haven't tested it, but I think it should put you on the right path.

for node in body.nodeIterator(showOptions: .text, filter: nil) {
  guard let textNode = node as? HTMLText else { continue }

  // get text context
  let textContent: NSString = textNode.textContent as NSString
  // check if search string is contained in the textContent
  let textRange = textContent.range(of: searchString, options: .caseInsensitive)
  guard textRange.location != NSNotFound else { continue }

  // construct a range instance for the occurrence of the searchString
  let range = HTMLRange(
    document: (self.htmlParser?.document)!,
    startContainer: textNode,
    startOffset: UInt(textRange.location),
    endContainer: textNode,
    endOffset: UInt(textRange.location + textRange.length)
  )
  // append it to the list
  ranges.append(range)
}

ranges.forEach { range in
  // create new span
  let span = HTMLElement(tagName: "span", attributes: ["class": "highlight"])
  // surround the range contents with the new element
  range.surroundContents(span)

  // get the id of the the direct parent of the start container
  guard let parentElement = range.startContainer.parenNode as? HTMLElement else { return; }
  let id = parentElement.attributes["id"]

  // get the innerHTML of the direct parent of the start container
  let innerHtml = parentElement.innerHTML

  DispatchQueue.main.sync {
    self.webView?.js("document.getElementById(\(id)).innerHTML = \(innerHtml)")
  }
}
gneil90 commented 7 years ago

Exactly what I needed. Just fixing a typo (else when we use guard and parent instead of parentNode):

    ranges.forEach { range in
      guard let parentElement = range.startContainer.parent as? HTMLElement else {
        return;
      }
    }

PS Add some donation info in your readme, in case of somebody wants you buy a bottle of beer!

iabudiab commented 7 years ago

@gneil90 Happy to hear that 👍 Cheers 🍻

PS: Typo fixed in snippet. Thanks.