Session is duplicating scan values

Yvtq8K3n commented 3 years ago

I wanted to use a session to first read predefined URLs, exclude some and only then start scanning. However, when I use the --store-session, every time I check the DB the Url's values keep duplicating.

I tried quite a few things, but none of them seem to work. Wapiti is returning 10 URL instead of 6.

devl00p commented 3 years ago

Hello,

It may be related to issue #79 which have been fixed recently.

I would be great if you could fetch the latest version of the code from the github repository and test again.

In every case we will investigate what is happening.

Regards

Yvtq8K3n commented 3 years ago

Hi, I downloaded the latest version from Github(main branch).

My initial plan is to load the URLs that I want to be added to wapiti, like so: wapiti -u https://test.pt/ --start https://test.pt/favicon.ico --start https://test.pt/robots.txt -d 0 -m "" --flush-session --store-session /home/marquez/Desktop/WapitiResults Returns: [*] Wapiti found 3 URLs and forms during the scan

And only then start scanning using the command: wapiti -u https://test.pt/ -d 4 -m "" --store-session /home/marquez/Desktop/WapitiResults Returns: [*] Wapiti found 10 URLs and forms during the scan

I also tried a few things like resume(10 URLs) or skip crawler(3 URLs)

bretfourbe commented 3 years ago

Hi @Yvtq8K3n, Can you tell if this problem still occurs for you ? I tried on my side with your example but i have only 3 URLs in both cases.

Yvtq8K3n commented 3 years ago

The problem is still occurring. I'm using an application called test, however, make sure u running an application like DVWA. After initially providing some specific URLs like: https://test.pt/robots.txt I want to start the scanning process, however, for some reason, Wapiti is repeating the initial URLs that I provided. It may be a problem of persistence, but still, I should only have 6 URLs, not 10. If I do the scan again, wapiti keeps duplicating the same URLs entries.

bretfourbe commented 3 years ago

@Yvtq8K3n Can you try again with the following commands from your wapiti directory:

python3 -m venv wapiti3
source wapiti3/bin/activate
make install
./bin/wapiti -u https://test.pt/ --start https://test.pt/favicon.ico --start https://test.pt/robots.txt -d 0 -m "" --flush-session --store-session /home/marquez/Desktop/WapitiResults
./bin/wapiti -u https://test.pt/ -d 4 -m "" --store-session /home/marquez/Desktop/WapitiResults

You can also try to add -v 1 or -v 2 to see the crawled urls.

devl00p commented 3 years ago

I wasn't able to reproduce either. I first thought it was related to start urls but it doesn't seem to be the case.

You may still be calling the old version of Wapiti, maybe try to uninstall the old one first.

Yvtq8K3n commented 3 years ago

Sorry for the long response, I followed exactly the instructions you previously provided me. It seems I still had an old version of wapiti, but the problem still persists. After I provide the start URLs while using the store-session parameter wapiti skips the crawling.

If I flush the DB, wapiti will start to perform the crawling, but it will not have the initial URLs I wanted to provide:

If I try to use the --resume-crawl option, wapiti will still skip the scan and only store the 3 initial URLs I provided.

If I don't use the --depth 0, wapiti will add the wanted URLs and start crawling. However, for my scenario, it would be important to add all the wanted URLs and only then start crawling.

bretfourbe commented 3 years ago

@Yvtq8K3n Ok so at least you can confirm there is no more duplicated values, right ? Can you explain what do you expect from the second scan you launch ? I do not understand what is the purpose of providing some urls first and only then crawling again.

Yvtq8K3n commented 3 years ago

I'm currently using multiple tools and sometimes it may not be possible to start scanning right away. That's why it would be important to split the scanning phase from the initial URLs I want to provide.

devl00p commented 3 years ago

The problem is as you are using -d 0 Wapiti will only keep the base url (-u) and start urls (--start) but will exclude found URLs in webpage because they are out of the limit specified for the depth (-d 0).

It explains why resuming the crawl won't work because Wapiti did not keep URLs it saw as they did not conform with the depth.

Changing the depth after won't work as those excluded URLs weren't kept for later.

If you want to add some URLs by editing the sqlite DB it should work like this:

./bin/wapiti -u https://opensource.com/ --start https://opensource.com/robots.txt -d 0 -m "" --flush-session --store-session /tmp/yoyoyo/ -v 2

     __      __               .__  __  .__________
    /  \    /  \_____  ______ |__|/  |_|__\_____  \
    \   \/\/   /\__  \ \____ \|  \   __\  | _(__  <
     \        /  / __ \|  |_> >  ||  | |  |/       \
      \__/\  /  (____  /   __/|__||__| |__/______  /
           \/        \/|__|                      \/
Wapiti-3.0.4 (wapiti.sourceforge.io)
[*] Vous êtes chanceux ! C'est la pleine lune ce soir.
[+] GET https://opensource.com/ (0)
[+] GET https://opensource.com/robots.txt (0)
[*] Enregistrement de l'état du scan, veuillez patienter...

 Note
========
Ce scan a été sauvé dans le fichier /tmp/yoyoyo/opensource.com_folder_ec3f71d6.db
[*] Wapiti a trouvé 2 URLs et formulaires lors du scan
[*] Chargement des modules :
         backup, brute_login_form, buster, cookieflags, crlf, csp, csrf, exec, file, htaccess, http_headers, methods, nikto, permanentxss, redirect, shellshock, sql, ssrf, timesql, wapp, wp_enum, xss, xxe

Rapport
------
Un rapport a été généré dans le fichier /home/sirius/.wapiti/generated_report
Ouvrez /home/sirius/.wapiti/generated_report/opensource.com_03272021_1343.html dans un navigateur pour voir ce rapport.

Then add an entry in the sqlite DB:

sqlite3 opensource.com_folder_ec3f71d6.db
SQLite version 3.35.2 2021-03-17 19:07:21
Enter ".help" for usage hints.
sqlite> insert into paths (path, method, depth, evil) values ("https://opensource.com/life/15/2/resize-images-python", "GET", 0, 0);

Relaunch Wapiti but this time with an attack module:

./bin/wapiti -u https://opensource.com/ --start https://opensource.com/robots.txt -d 0 -m "crlf" --store-session /tmp/yoyoyo/ -v 2

     __      __               .__  __  .__________
    /  \    /  \_____  ______ |__|/  |_|__\_____  \
    \   \/\/   /\__  \ \____ \|  \   __\  | _(__  <
     \        /  / __ \|  |_> >  ||  | |  |/       \
      \__/\  /  (____  /   __/|__||__| |__/______  /
           \/        \/|__|                      \/
Wapiti-3.0.4 (wapiti.sourceforge.io)
[*] Vous êtes chanceux ! C'est la pleine lune ce soir.
[*] Reprise du scan depuis la session enregistrée, veuillez patienter
[+] GET https://opensource.com/life/15/2/resize-images-python (0)
[*] Enregistrement de l'état du scan, veuillez patienter...

 Note
========
Ce scan a été sauvé dans le fichier /tmp/yoyoyo/opensource.com_folder_ec3f71d6.db
[*] Wapiti a trouvé 4 URLs et formulaires lors du scan
[*] Chargement des modules :
         backup, brute_login_form, buster, cookieflags, crlf, csp, csrf, exec, file, htaccess, http_headers, methods, nikto, permanentxss, redirect, shellshock, sql, ssrf, timesql, wapp, wp_enum, xss, xxe

[*] Lancement du module crlf
[+] GET https://opensource.com/ (0)
[+] GET https://opensource.com/life/15/2/resize-images-python (0)
[+] GET https://opensource.com/robots.txt (0)

Rapport
------
Un rapport a été généré dans le fichier /home/sirius/.wapiti/generated_report
Ouvrez /home/sirius/.wapiti/generated_report/opensource.com_03272021_1344.html dans un navigateur pour voir ce rapport.

This time we can see the injected URL was part of the attack. However Wapiti did also scan the URL because headers wasn't specified at the injection time and persister.get_to_browse() will extract all URLs that have an empty headers column to resume crawling.

Indeed if I check the DB this time there is a duplicate:

sqlite> select path_id, path, http_status from paths;
1|https://opensource.com/|200
2|https://opensource.com/robots.txt|200
3|https://opensource.com/life/15/2/resize-images-python|
4|https://opensource.com/life/15/2/resize-images-python|200

The one we injected and the one that was actually crawled.

The problem seems to appear when persister.add_request(resource) is called. This method use SqlitePersister._set_paths and is only making INSERT SQL requests.

Therefore it seems logic the problem may appear when resuming a crawl: we should UPDATE the entry (if there is a path_id on the Request object) instead of INSERTing a new one (or erase then INSERT the new one).

devl00p commented 3 years ago

Fix pushed. Thank you for reporting this :)

wapiti-scanner / wapiti

Session is duplicating scan values #85