laramies / theHarvester

E-mails, subdomains and names Harvester - OSINT
http://www.edge-security.com/
11.33k stars 2.01k forks source link

Skipping emails when word-break <wbr> #32

Closed JohnLife96 closed 5 years ago

JohnLife96 commented 8 years ago

Results seem to be incomplete or missing when there are tags after the '@' before parsing.

The result displayed will be: "@domainname.com", so no name before of the '@'.

I've tried cleaning up the results but unsuccessful so far. Has anyone found a fix for this?

laramies commented 8 years ago

Hi John, can you provide me some example strings that would fail, and I will review the parser? Cheers

JohnLife96 commented 8 years ago

An example would be this:

<em>
   name
</em>
@
<wbr></wbr>
<em>
   domainname.com
</em>

So I'm guessing the <em> tags get cleaned up nicely but the <wbr> does not.

JohnLife96 commented 8 years ago

Another example

<em>
   name
</em>
@
<em>
   domainname
</em>
.
<wbr></wbr>
<em>
com
</em>

Same issue but i really seems to be the <wbr></wbr> tags messing it up.

I actually think because of these tags the email doesn't get added to the array at all.

Also, I tried adding the <wbr></wbr> tags to the myparser to be substituted, but that doesn't seem to help. Just in case you were wondering.

JohnLife96 commented 8 years ago

I found a thread about it here: http://stackoverflow.com/questions/31276230/removing-wbr-tags-and-grabbing-the-info-between/31276397#31276397

Let's see if it works.

laramies commented 8 years ago

I made a change in the parser, can you please check it with the latest version?

NotoriousRebel commented 5 years ago

As this has been resolved, closing!