s0md3v / Photon

Incredibly fast crawler designed for OSINT.
GNU General Public License v3.0
10.96k stars 1.49k forks source link

Moved exclude check to step 2 #42

Closed connorskees closed 6 years ago

connorskees commented 6 years ago

Not to spam you with pull requests, but is this what you were thinking?

s0md3v commented 6 years ago

Great! Can you confirm that it's working as intended?

connorskees commented 6 years ago

Yes everything works exactly as intended. --exclude . and --exclude http. return 0 urls while --exclude \d and --exclude "" return the same as without the exclude flag.

s0md3v commented 6 years ago

It's not working on my end.

connorskees commented 6 years ago

With no flag: image

Flag of .* image

Flag of "thisdoesnotexist" image

Flag of .*teach.* (note that only one link is removed) image

s0md3v commented 6 years ago

Doesn't work :')

screenshot_2018-08-03_02-46-57

connorskees commented 6 years ago

mm do you want to exclude keywords or regexes? with the regex, change to .*?questions

connorskees commented 6 years ago

Sorry, my bad I added it back in with Update photon.py (4c60e1b)

s0md3v commented 6 years ago

It still doesn't work :)

screenshot_2018-08-03_12-55-09

connorskees commented 6 years ago

I think the issue is that although it isn't crawling the links, it is still adding them to the links file. Is it ok to just use remove_regex() on the list before they are exported?

connorskees commented 6 years ago

Now testing this against the links file, links containing the regex are not added.