jbr / jQuery.highlightRegex

jQuery Plugin that lets you highlight document text with a regular expression.
http://jbr.github.io/jQuery.highlightRegex
102 stars 22 forks source link

Does not ignore HTML tags #11

Closed m-spyratos closed 7 years ago

m-spyratos commented 11 years ago

First of all, thank you for this plugin... I will demonstrate my problem with an example:

<p>This is a text that contains a <span>span</span> tag </p>

If I search for 'a span tag', it won't highlight the corresponding string.

jbr commented 11 years ago

Thanks for filing an issue! I'll take a look at this today.

lincolnthree commented 10 years ago

Hey! Any luck on this one? I'm struggling with this problem too :/ Not really sure where to start in your code.

jbr commented 10 years ago

Hi @lincolnthree and @m-spyratos: I'm currently refreshing myself on this code. What is the desired behavior in the described situation? Would you expect

<p>This is a text that contains
<span class="highlight">a</span>
<span><span class="highlight">span</span></span>
<span class="highlight">tag</span></p>

or would you expect

<p>This is a text that contains
<span class="highlight">a<span>span</span>tag</span>
</p>
lincolnthree commented 10 years ago

Hey! To be honest, I don't really think the first one is really that bad, but I think the second one would be cleaner because one highlight span would encapsulate the text. They both have their trade-offs I suppose.

Thanks so much for taking a look at this. I tried to figure out how it worked, but got a bit lost when debugging the super-recursive calls to normalize in the browser. I'm not really a front-end guy :)

lincolnthree commented 10 years ago

I got a bit lost on how the DOM nodes were maintained in order to decide where and when to add the highlight span.

Gustavo commented 10 years ago

@jbr, I sent you an email about this issue (or feature) and now I see this open issue which addresses the same question so please ignore my email and let's discuss here. My problem is the same. There are texts that I allow user to highlight with the mouse and sometimes the text excerpt has a tag. Example problem:

Text excerpt selected by user:

... of the <span class="keyword">GNU</span> General Public ...

Then it should become after selection:

... <span class="highlight">of the <span class="keyword">GNU</span> General Public</span> ... 
claudiomedina commented 9 years ago

Hello, Did anyone found a solution for this?

julkue commented 8 years ago

@claudiomedina @Gustavo @lincolnthree @m-spyratos @jbr This is possible with mark.js. It has an option separateWordSearch which allows you to highlight all words inside the search term even if they are splitted into different tags.

jazanne commented 8 years ago

Would love to see some resolution for this issue if anyone has made an progress!

julkue commented 8 years ago

@jazanne This might not be a solution for this plugin, but as mentioned above, you could give the option separateWordSearch of mark.js a try. If this also doesn't solve your issue, please let us know your exact use case. I'm sure we can find a solution.
I'm looking forward to your feedback.

jazanne commented 8 years ago

@julmot The issue w/mark.js is that you actually can not use separateWordSearch w/RegExps, which is why I am using this plugin. My issue is the same as the top post of this thread...I need to be able to highlight "how you doing" /\bhow\b(\s[^.!>]*)? \bdoing\b/gi in an html structure that looks like this: <li><span>well</span> how you <span>doing</span></li>.

julkue commented 8 years ago

@jazanne A few questions:

  1. Why exactly do you really need a custom regular expression?
  2. Wouldn't it be possible in your case to simply search for each word, like in this example?
  3. If you really need a custom RegExp and you can't search for each word separated, what of the two above named suggestions do you expect?
jazanne commented 8 years ago

@julmot 1&2. I am not looking for these words individually, I am looking for them in sequence, so i can't use an or. Also, I need to be able to find any variation of weird between how and doing, for example: how you doing, how are you doing, how is she doing, etc.

  1. Either of the two above sections work for me, as long as all the words end up being highlighted. The first one may be better to avoid potentially overlapping elements like this this one here
julkue commented 8 years ago

@jazanne Thank you for the input. I will think about this and let you know the result soon. @jbr Did you had a result of your evaluation?

julkue commented 8 years ago

After thinking about this I came to the conclusion that a good solution for this might be impossible or at least very complex to implement.

Options of above named suggestions

Take this HTML DOM as an example:

<div>
    How
    <span>
        are you
    </span>
    doing?
</div>

If you are searching for How are you doing the option 2 of the above named suggestions may be possible, but when searching for e.g. How you doing? (with a RegExp), it would not possible as the highlight wrapper would also include "are". In this example:

<div>
    <span class="highlight">
        How
        <span>
            are you
        </span>
        doing?
    </span>
</div>

So, let's focus on option 1.

Methods

First I'd like to say that any solution that will manipulate a specified RegExp may end up in unexpected behavior, so I will not think of a solution that manipulates it (which is why the option separateWordSearch of mark.js isn't available for the method markRegExp()). What remains in my thoughts, is the option to search inside the whole text of the provided context – as this will also include text of nested elements, in jQuery with e.g. $(".context").text() – and then map matches in this text with actual DOM elements. We can't just read the innerHTML value and manipulate it, since when rewriting the changes all events on containing elements will be lost. Also this would cause full utilization of the browser, as he needs to redraw the DOM and not just insert new highlight elements.

Unfortunately I don't see a way to realize this. Any thoughts?

lincolnthree commented 8 years ago

I think you might have better success by traversing the dom, extracting all text nodes, then applying the regex to that text - but you'd need some way to map textual matches back to the node that they belong to. Just a thought. It would take some implementation, but it could be very powerful.

Lincoln Baxter, III http://ocpsoft.org "Simpler is better."

On Thu, May 19, 2016 at 4:43 AM, Julian Motz notifications@github.com wrote:

After thinking about this I came to the conclusion that a good solution for this might be impossible or at least very complex to implement. Options of above named suggestions

Take this HTML DOM as an example:

How are you doing?

If you are searching for How are you doing the option 2 of the above named suggestions https://github.com/jbr/jQuery.highlightRegex/issues/11#issuecomment-31561743 may be possible, but when searching for e.g. How you doing? (with a RegExp), it would not possible as the highlight wrapper would also include "are". In this example:

How are you doing?

So, let's focus on option 1. Methods

First I'd like to say that any solution that will manipulate a specified RegExp may end up in unexpected behavior, so I will not think of a solution that manipulates it. What remains in my thoughts, is the option to search inside the whole text of the provided context – as this will also include text of nested elements, in jQuery with e.g. $(".context").text() – and then map matches in this text with actual DOM elements.

Unfortunately I don't see a way to realize this. Any thoughts?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/jbr/jQuery.highlightRegex/issues/11#issuecomment-220263187

julkue commented 8 years ago

@lincolnthree Thanks, this is what I meant with

search inside the whole text of the provided context

Do you have any idea how to get started with the mapping point?

lincolnthree commented 8 years ago

And I just read the last paragraph of your note. Yes. Exactly. You could do this with a secondary data-structure that maps coordinates in the text string to each node. A simple sorted data structure would do this: (tweaks might be required, but just a conceptual example.)

Array of node coordinates end positions to :

{ 5 -> node1 42 -> node2 50 -> node3 }

Then you could apply the regex, get the matching groups, iterate over the matching groups and that would let you query the map for your nodes.

Lincoln Baxter, III http://ocpsoft.org "Simpler is better."

On Thu, May 19, 2016 at 10:52 AM, Lincoln Baxter, III < lincolnbaxter@gmail.com> wrote:

I think you might have better success by traversing the dom, extracting all text nodes, then applying the regex to that text - but you'd need some way to map textual matches back to the node that they belong to. Just a thought. It would take some implementation, but it could be very powerful.

Lincoln Baxter, III http://ocpsoft.org "Simpler is better."

On Thu, May 19, 2016 at 4:43 AM, Julian Motz notifications@github.com wrote:

After thinking about this I came to the conclusion that a good solution for this might be impossible or at least very complex to implement. Options of above named suggestions

Take this HTML DOM as an example:

How are you doing?

If you are searching for How are you doing the option 2 of the above named suggestions https://github.com/jbr/jQuery.highlightRegex/issues/11#issuecomment-31561743 may be possible, but when searching for e.g. How you doing? (with a RegExp), it would not possible as the highlight wrapper would also include "are". In this example:

How are you doing?

So, let's focus on option 1. Methods

First I'd like to say that any solution that will manipulate a specified RegExp may end up in unexpected behavior, so I will not think of a solution that manipulates it. What remains in my thoughts, is the option to search inside the whole text of the provided context – as this will also include text of nested elements, in jQuery with e.g. $(".context").text() – and then map matches in this text with actual DOM elements.

Unfortunately I don't see a way to realize this. Any thoughts?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/jbr/jQuery.highlightRegex/issues/11#issuecomment-220263187

julkue commented 8 years ago

@lincolnthree Good catch. I've got the idea.

  1. Iterate over all DOM elements
  2. Store the text node value in a assembled string from all text node values
  3. Store the end position of the text node value in a mapper object
  4. Search for the RegExp inside the assembled string
  5. Map the matches with the DOM elements with the mapper object

I see at least one issue with that. When having a DOM structure that looks like this:

<h1>Lorem ipsum</h1>
<p>dolor sit amet</p>

and searching for "Lorem ipsum dolor", then of course all these word would be highlighted. However, currently, they won't. And as they are objectively considered completely different things (headline and content), this may confuse users.

lincolnthree commented 8 years ago

Match groups contain indices: http://stackoverflow.com/questions/2295657/return-positions-of-a-regex-match-in-javascript

Also, I think the behavior you mentioned below is fine, but perhaps a setting to "enable/disable matching over content boundary tags" such as <h1,2,3,x>,

,

, etc...

Thoughts?

Lincoln Baxter, III http://ocpsoft.org "Simpler is better."

On Thu, May 19, 2016 at 11:19 AM, Julian Motz notifications@github.com wrote:

@lincolnthree https://github.com/lincolnthree Good catch. I've got the idea.

  1. Iterate over all DOM elements
  2. Store the text node value in a assembled string from all text node values
  3. Store the end position of the text node value in a mapper object
  4. Search for the RegExp inside the assembled string
  5. Map the matches with the DOM elements with the mapper object

At the last point, I am still somewhat uncertain as the matching groups don't contain start and end positions. How would you map the matching groups with the mapper object?

I see at least one issue with that. When having a DOM structure that looks like this:

Lorem ipsum

dolor sit amet

and searching for "Lorem ipsum dolor", then of course all these word would be highlighted. However, currently, they won't. And as they are objectively considered completely different things (headline and content), this may confuse users.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/jbr/jQuery.highlightRegex/issues/11#issuecomment-220357629

julkue commented 8 years ago

@lincolnthree Are you talking about setting a filter array with exclusion selectors? If so, this is implemented as a "filter" option in mark.js.

However, this would not solve my concern. What the user may expect is that there is still a highlighting on headlines, sections etc., but not accross these tags. But, allowing to specifiy an option for that would mean to call two completely separate plugins. One that actually highlights matches directly inside the DOM and one that walks it down and creates a mapper object first.

julkue commented 8 years ago

Alright, I've thought about this and I'm willing to implement it as an option in mark.js. It requires a large refactoring, as the internal structure will be very different. However, in this issue four people have asked for that feature in almost three years. Please don't understand me wrong, I really love helping people solving their issues. But I think it is fair to put a certain demand as requirement, let's say about ten people. Otherwise there are other features I should implement instead, that may help more people. So I'm waiting for someone to open an issue regarding this matter and around ten people commenting below.
By the way; what would be the correct option name in your opinion?

julkue commented 8 years ago

Hey guys,

Good news! :tada:

I've just released v8.0.0 of mark.js including an option acrossElements. This option makes it possible to find matches across multiple HTML elements – even across nested iframes.

This feature caused the biggest internal refactoring in this project (beside to the iframe support itself) and took about 6 weeks of work. I'd really appreciate if you give this a try.

I've finally came to the decision to implement this as more and more people requested it, in this issue, https://github.com/julmot/mark.js/issues/46 and on StackOverflow.

:octocat:

lincolnthree commented 8 years ago

Dude. This is awesome! I am going to check it out ASAP! Great job! I'm sure this was no easy task!

jbr commented 7 years ago

@jmngpt which patch are you referring to? I mostly see discussion of an unrelated library in this thread

julkue commented 7 years ago

Unrelated?

jbr commented 7 years ago

@julmot Yeah, a separate codebase that serves a similar goal. Please keep discussions of that repo in the appropriate place. I'd prefer that issues on this ancient repo remain focused on the relevant code.

julkue commented 7 years ago

I agree with you but there's no reason to reinvert the wheel. And from my point of view it is related since it solves the requested behaviour.

jbr commented 7 years ago

A single comment like "Here's an issue on another repo that addresses this" is the appropriate way to handle this. Further discussion of the details will not be this repo. Closing thread.

jbr commented 7 years ago

For future contributors, I would welcome a PR that addresses the issue outlined by OP