Text in certain elements isn't replaced

vessillo / foxreplace

Automatically exported from code.google.com/p/foxreplace

0 stars 0 forks source link

Text in certain elements isn't replaced #62

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

In FoxReplace assumes that webpage authors adhere more strictly to the HTML 
specification than is in fact the case. In particular, there are some HTML 
elements which are not technically supposed to contain text, but which many 
people do use in that way. For instance, ul and ol are only supposed to have 
text in their children, but many people put headings within the list before any 
list items. FoxReplace will not handle this correctly. FoxReplace also doesn't 
support text replacement within non-standard tags such as blink.

To reproduce:
1. Make sure that FoxReplace is set to only replace text, not HTML. 
2. Go to http://www.jwz.org/doc/java.html . (used as an example)
3. Try to replace "blur" with "test".
4. Try to replace "--" with "test".

The page will still contain both "blur" and "--", and will not contain "test".

Original issue reported on code.google.com by Zze...@gmail.com on 17 Apr 2012 at 3:07

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

OK, these cases were'nt expected. In fact I had never seen <nobr> before. I 
will add support for more tags.

Original comment by marc.r...@gmail.com on 17 Apr 2012 at 5:38

Changed title: Text in certain elements isn't replaced
Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Actually, I was looking at the source just now, and I realized that there's a 
simpler and more robust way to implement this. If the XPath were something 
along the lines of "//body//text()[not(parent::script)]" or 
"//body//text()[not(ancestor::script)]", that would allow you to do it by 
exclusion rather than inclusion; in your comment at this section you mention 
that checking the parent's name causes problems because of case-sensitivity, 
but checking it this way, by looking at the node type rather than its name, 
averts that problem. It would also allow you to get most of the nodes in a 
single operation. Of course, this does have the issue that you'd have to put 
all the exclusion rules into one conditional.

Original comment by Zze...@gmail.com on 17 Apr 2012 at 7:02

GoogleCodeExporter commented 9 years ago

Thanks for the idea, I will try it ;)

Original comment by marc.r...@gmail.com on 20 Apr 2012 at 9:34

GoogleCodeExporter commented 9 years ago

As a side note: if you ever do need to do case-insensitive XPath comparisons, 
you can use the translate function. As the spec ( 
http://www.w3.org/TR/xpath/#function-translate ) points out, there are 
languages for which this won't work, but for most European languages it should 
be fine.

Original comment by Zze...@gmail.com on 20 Apr 2012 at 10:01

GoogleCodeExporter commented 9 years ago

Fixed in r166 using a small variant of your suggestion. Thanks Zzedar!

Original comment by marc.r...@gmail.com on 2 Dec 2012 at 5:08

Changed state: Fixed