vessillo / foxreplace

Automatically exported from code.google.com/p/foxreplace
0 stars 0 forks source link

It doesn't replace words in bracket #149

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. for example go to a wikipedia article
2. write a substitution to remove any [edit] 
3. or a regular expression to remove any text in brackets

What is the expected output? What do you see instead?

I expect it be removed but they are there, however other replacement may done

What version of the product are you using? On what operating system?

Firefox 33 on Windows 7, the latest version of addon

Please provide any additional information below.

Original issue reported on code.google.com by pouram...@gmail.com on 22 Oct 2014 at 1:47

GoogleCodeExporter commented 9 years ago
In case of the [edit] links in Wikipedia, they are a special case. Actually 
there are three strings: "[", "edit", and "]", because only the word "edit" is 
a link, but not the brackets. So, when searching for the string "[edit]" as a 
single piece it isn't found. The only way to solve this is to enable HTML in 
input & output and then write the appropiate substitution.

So, the brackets are not any problem. If you try to replace "[edit]" here it 
works.

Original comment by marc.r...@gmail.com on 22 Oct 2014 at 3:55

GoogleCodeExporter commented 9 years ago
Then it seems I need a regular expression to remove those items, in fact I need 
to remove [edit],[hide] and any reference like [1] or [2]..., but I whatever I 
try with regular expression I don't succeed
I used \[\d+\] to remove references no result ..

Please guide me

Original comment by pouram...@gmail.com on 22 Oct 2014 at 9:04

GoogleCodeExporter commented 9 years ago
Moreover, as you not such modification is a very hard task, I wish it has a 
feature to count the visible html as the text and the problem you addressed 
wouldn't occur,

Original comment by pouram...@gmail.com on 22 Oct 2014 at 9:13

GoogleCodeExporter commented 9 years ago
It's technically possible to find the text ignoring the html, but the problem 
is in the substitution semantics: if you want to replace "[edit]", where the 
letters are in a link, by another text like "asdf", then it would have to be 
either all the text without link, or all the text in a link, or partially in 
each state, but this can vary case by case, so it's impossible to generalize.

As for your problem, you could try with \[.*\] and HTML in the input and the 
output.

Original comment by marc.r...@gmail.com on 23 Oct 2014 at 8:11

GoogleCodeExporter commented 9 years ago
Thank you the suggested \[.*\] worked but its too general, and whetever I try 
for references doesn't work, could you please give a regular expression to 
replace anything like

[number]  for example [1] or [11]
and
[*edit*] to cover the edit link

Original comment by pouram...@gmail.com on 23 Oct 2014 at 9:46

GoogleCodeExporter commented 9 years ago
Moreover it shouldn't be a hard task, regardless of what portion is text o link 
I think InnerText gives you the text without considering its html or text, then 
you could replace on this text,(suppose everything visible as text, its very 
meaningful)

Original comment by pouram...@gmail.com on 23 Oct 2014 at 9:49

GoogleCodeExporter commented 9 years ago
If you need a more specific RegExp you can include more text in it. For 
example, the hmtl of an "[edit]" text (excluding link details) is <span 
class="mw-editsection"><span 
class="mw-editsection-bracket">[</span><a>edit</a><span 
class="mw-editsection-bracket">]</span></span>. You can include as much of it 
as you need to match exactly what you want, and the same technique can be used 
for the references.

Original comment by marc.r...@gmail.com on 23 Oct 2014 at 4:13

GoogleCodeExporter commented 9 years ago
Thank you it worked by 
<span class="mw-editsection-bracket">\[</span>(.*)<span 
class="mw-editsection-bracket">\]</span>

and for references by 

<span>\[</span>(\d+)<span>\]</span>

I don't know whether the pattern should be placed in parenthesis or not, but by 
try and error it was what worked

Original comment by pouram...@gmail.com on 27 Oct 2014 at 7:25