Add crawler and form finder

fantesykikachu commented 1 year ago

Sorry for taking so long with this, the dependency hell was real with this one, and I have started a new job in the last couple of months, so I have been busy with that.

So you are aware I added a small fix, you had added a default value of GET to the method argument in the cliparser, but it was breaking method detection so I removed it. The code that I added to the main function is kind of messy and I would like to clean it up, but that will likely reorganizing the Channel class so it can be done later.

For new dependencies, sqlmap's crawler had the option of using beautifulsoup, but the hardcoded fallbacks seem to work fine without it so I have disabled (commented out) the use of beautifulsoup. However in the form finder sqlmap used a library called clientform, and like all of sqlmaps dependencies is built-in, because clientform has never and will never be ported to python 3 (the developers consider the project obsolete). The developers of clientform recommend using their new library mechanize, however using mechanize the intended way is not an option for us; So I went with calling it's internal functions, however as an argument the functions take a large nested dataclass created by a different library called html5lib, so I had to add and call that too (html5lib is already a dependency of mechanize).

Also sqlmap is heavily threaded but SSTImap is not, so I have removed the use of threading but left the structure, so it can be added at a later date if somebody wants to.

vladko312 commented 1 year ago

Thank you for your contribution!

I am currently finishing work on version 1.1, which changed some things, so I would merge this into 1.1.

Also, could you please separate the crawler logic from the predetermined mode and add a crawler option into the interactive mode?

fantesykikachu commented 1 year ago

I can separate the crawler logic if you want, but I think it will look worse; and I can't write the interactive mode, because I can't figure out how any of those functions are actually called.

vladko312 commented 1 year ago

Functions are called through cmd module. Just creating the function do_crawler(self, line) would be enough to create an interactive command crawler.

fantesykikachu commented 1 year ago

There is the interactive mode support; Again it bloats the do_run function, although it should get better after restructuring the Channel class so it can be reused rather than having to create a new object for each URL/form.

vladko312 commented 1 year ago

There are still a couple things that should be done

There is currently no way to disable forms in interactive mode. Maybe, interactive command should only toggle the setting, while form finding would happen in the run command? Same with crawl.

This would also allow the user to set those settings with command-line arguments and fix issues related to changing the url.

fantesykikachu commented 1 year ago

When you say there is no way to disable forms in interactive mode, do you mean if you set forms there is no way to switch back to single URL; If so then instead of relying on checking args, I can check the type, that way calling url again resets it. Otherwise I could have do_url just unset the args.

fantesykikachu commented 1 year ago

So I went with just unsetting crawl and forms in the do_url function; I also included a warning that if you are going to run both crawl and forms, to run crawl before forms. Right now the intended order of commands to do both crawl and forms would be: url \<URL> crawl \<depth> forms run

and if at any point you no longer want to do crawl and/or forms just call url , and you can then call crawl or forms if you still want to do one or the other. Also calling crawl 0 is not a viable solution, so I have added a warning and disabled it.

vladko312 commented 1 year ago

This is not the best way to do it. Interactive mode was done to keep the settings between runs. Having to run multiple commands in a specific order is a bad idea, as well as having to re-run them every time.

The best solution would be to just use do_crawl and do_forms as a way to change the settings, while crawling and form scanning would be done in do_run

vladko312 commented 1 year ago

Also, there is currently no way to change crawlExclude value in interactive mode

fantesykikachu commented 1 year ago

So I moved the crawl and form search operations to the do_run function, and I added a function to set crawlExclude. This will also probably make my follow up PR easier, restructuring and reusing the channel.

vladko312 commented 1 year ago

This is great! I will check it soon.

As for the channel, I decided not to reuse it. Separate channels would be used to quickly switch between found vulnerabilities.

fantesykikachu commented 1 year ago

If you don't want to reuse channels then where do you recommend storing the FD for a results file, as the channel is the only thing passed to check_template_injection and repeatedly opening and closing the file is not good nor is declaring it a global.

vladko312 commented 1 year ago

I don't think, that writing results during detection is the best way. It would make it harder to create parallel tasks in the future.

fantesykikachu commented 1 year ago

Why, the amount written would be very small, and python has good thread handling of IO tasks like that, and waiting till all tasks have finished would mean that if it hangs and a user has to close the process then the results aren't saved. Also it would be writing the file right after printing the summary, and if we can save the FD so we don't have to keep opening the file then it should be really fast.

vladko312 commented 1 year ago

I still think about the way results would be saved, but it should be possible to interrupt any process without losing the result. Writing results during detection would limit capabilities of producing different result formats after the scan.

fantesykikachu commented 1 year ago

So as it stands right now, it doesn't store the results of previous scans (after a second scan), and if you add that then all of the data will still be there and you can choose to out put it in any format you want.

vladko312 commented 1 year ago

I want to store only the channels, that detected a vulnerability. So, what is the point of creating some form of report during the detection? Separate files can be created for detected vulnerabilities, while the final report can be done in the end, after the detection.

fantesykikachu commented 1 year ago

So again, we currently only store the active channel, not previous ones; And this is the data I am writing for a detected vulnerability: Target URL,Method,Parameter,Engine, so writing a new file for each detection would be a horrible idea, as for writing it at the end you want to make that change you can, but since AGAIN we CURRENTLY don't store previous channels it would be impossible to do that right now.

vladko312 commented 1 year ago

Finally checked your code and understood your problem. In your code, crawler keeps checking other urls and forms even after success. In the future, channels would be stored, and the files would contain enough information to restore the state of the channel. For now, you can just add temporary breaks, followed by # TODO comments. I will likely add vulnerability saving in 1.2, so for now it would only scan until the first occurence.

vladko312 / SSTImap

Add crawler and form finder #8