Duplicate URL's - Githubissues

Hey buddy,

i figured I would post it in here too https://github.com/jaeles-project/gospider/issues/21#issuecomment-953916547

They actually try to get rid of duplicates with their "stringset" implementation.... funny thing is they actually don't need that entire code because colly handles this for them. The issue seems to be that if the same URL is found in, for example, a form or an a tag, it is not checked, only URLs that are found inside forms are checked with URLs found in other forms not against a's ..... long story short.. you can modify crawler to actually use colly s built in filter and then it works.

I can't really share the code cause I use it as a library not a command line tool, so I took out cobra and start it with a config file

I don't really care about status code but implementing that filter is easy

// They give you a status code as such 
response.StatusCode

you can then add an if statement before .Visit() is run and check for that errorcode

jaeles-project / gospider

Duplicate URL's #23