What steps will reproduce the problem?
1. Perform a harvester query, for a known organisation
2. Notice that when you attempt to modify the email regex to:
(' ' + '[a-zA-Z0-9.-_]*' + '.' + '[a-zA-Z0-9.-_]*' + '@' + '[a-zA-Z0-9.-]*' +
self.word)
You will begin to see some results appearing as "... TEST@domain.co.uk"
3. These results are incorrectly being parsed, due to the fact that you are
creating the results not from the pages, but including truncated google results.
What is the expected output? What do you see instead?
Expected: "Test.TEST@domain.co.uk" - as viewed on webpage.
Actual: "... TEST@domain.co.uk" - From Truncated google result.
What version of the product are you using? On what operating system?
2.2a - Mac OS X
Please provide any additional information below.
I cannot see a fix for this, unless you provide a future command line switch
e.g. -IF (Investigate further and attempt to curl/ grep the page for the
corresponding result.)
Original issue reported on code.google.com by Fletcher...@gmail.com on 7 Sep 2014 at 12:22
Original issue reported on code.google.com by
Fletcher...@gmail.com
on 7 Sep 2014 at 12:22