spyboy-productions / CloakQuest3r

Uncover the true IP address of websites safeguarded by Cloudflare & Others
https://colab.research.google.com/github/spyboy-productions/CloakQuest3r/blob/main/cloakquest3r.ipynb
MIT License
1.22k stars 166 forks source link

Cloakquest3r Refactor Suggestion/Review #9

Closed JLowborn closed 4 months ago

JLowborn commented 5 months ago

While browsing through LinkedIn I found this tool and it was very useful to me so I thought "why not?" and I've decided to refactor this code while trying to improve my Python skills and helping the community somehow.

Before opening a new Pull Request I'd rather ask to the authors their opinion on the actual code as I don't expect you to accept those changes without complaints or questions. So here's a breakthrough of the changes.

:warning: Disclaimer: I'm not good at coloring outputs, and I find the methods to do so extremely irritating so I didn't used many colors, but this can always be updated later on.

:warning: Disclaimer #2: I'm also not the best programmer ever, but I've tried my best to facilitate people's life, this does include the author of the code.

:spiral_notepad: Changelog:

I've made a couple of changes, so I'm splitting this in different sections.

File Structure

I've modified the project's folder structure. Having a unique and extremely big file containing different functions and parts of the code can become a real mess sometimes and, unfortunately, this is the case.

Previous Structure

CloakQuest3r/
│
├── cloakquest3r.py
├── requirements.txt
└── wordlist.txt

Updated Structure

CloakQuest3r/
│
├── cloakquest3r.py
├── requirements.txt
├── config.toml             # Author information/links, version tag & API keys
├── core/                   # Core functionalities of the code
│   ├── __init__.py
│   ├── cloakquester.py     # Main functions
│   ├── banner.py           
│   └── color.py            
├── docs/                   # Documentation related files
│   └── LICENSE
└── wordlists/              # Out-of-the-box wordlists
    └── default.txt

Splitting the code into distinct sections enhances organization and makes it easier to find specific code segments when updating or adding new features. Cloakquest3r has been modularized, with its own dedicated module.

Code Changes

Previously created functions have been refactored, rewritten and/or renamed for simplicity of code as needed. Some functions and variables names have become clearer and shorter.

Features

I've added some features that I thought would help users and also trying to add some features mentioned in the Contribution section of the README.md file.

TODO:

Some changes are still in progress and may or may be not continued based on the author's opinion on the code.

Final Considerations

My only objectives is to help and collaborate to the code somehow while improving my coding skills, I don't expect those changes to be accepted without complaints. I'm willing to listen and react to feedback from the code author.

The refactored code is available here: https://github.com/JLowborn/CloakquesterRedone

image

thisisshubhamkumar commented 5 months ago

Oh wow!! you rewrote almost the whole thing. I haven't tested all the changes yet. Quoting your comments, here's how I feel about them.

I've modified the project's folder structure. Having a unique and extremely big file containing different functions and parts of the code can become a real mess sometimes and, unfortunately, this is the case.

- I created this tool as a proof of concept I know it's not well-written which is why it just has a single file.
I am NO programmer.

Updated Structure

CloakQuest3r/
│
├── cloakquest3r.py
├── requirements.txt
├── config.toml               # Author information/links, version tag & API keys
├── core/                 # Core functionalities of the code
│ ├── __init__.py
│ ├── cloakquester.py     # Main functions
│ ├── banner.py           
│ └── color.py            
├── docs/                 # Documentation related files
│ └── LICENSE
└── wordlists/                # Out-of-the-box wordlists
  └── default.txt
+ I like it, it’s clean.

Splitting the code into distinct sections enhances organization and makes it easier to find specific code segments when updating or adding new features. Cloakquest3r has been modularized, with its own dedicated module.

  • The banner.py contains functions related to banner and social media output, as this is not relevant for the main code to work, this allows the author to easily change the visual and/or colors as desired without touching the main code.
  • The config.toml file stores API keys and also other relevant information about the author and his social media links and code version.
  • The color.py file stores colored output related functions, to avoid unnecessary lines of code in the main file.
  • The default out-of-the-box tool wordlist is included inside the wordlists folder.
  • Templates, license, and other documentation should be stored inside the docs folder.

Code Changes

Previously created functions have been refactored, rewritten and/or renamed for simplicity of code as needed. Some functions and variables names have become clearer and shorter.

  • Reduced is_using_cloudflare and detect_web_server to a single function, since both do basically the same thing. The code now verifies wheter the target is behind Cloudflare before doing anything else, and only asks for confirmation if the code isn't.
  • The code now verifies both if both URL and wordlist file are valid. Both have their own functions for validation: is_valid_url and is_valid_file.
  • Two functions have been added _to_url and _to_hostname. Those basically convert the string as some functions require the URL schema (https://) while others don't.
- not a fan of this, it should accept the domain as a valid argument, this was the first error I got when I ran the tool forcing me to add HTTP/HTTPS 

u can use something like this ?

if input_text.startswith("http://") or input_text.startswith("https://"):
    match = re.search(r'(https?://)?([A-Za-z_0-9.-]+).*', input_text)
    if match:
        link = match.group(2)
    else:
        print("Invalid URL format.")
        exit(1)

.. or u can just check here if the status code is 200 of the domain

Features

I've added some features that I thought would help users and also trying to add some features mentioned in the Contribution section of the README.md file.

  • Added the -w/--wordlist flag, allowing the user to set a custom wordlist for subdomain discovery. \
- this feature was added in the last update
it asks the user for a custom wordlist before the subdomain scan,
if the user says yes it asks for a wordlist path if NO it downloads an updated wordlist(5000 subdomain)from SECLIST GitHub repo and use it..
the default wordlist just a backup which has only (700 subdomains)

go through the last update this was missing in ur code, it just uses default wordlist.
  • Added the --no-bruteforce option to avoid subdomain discovery through bruteforcing.
  • bruteforcing is the only way we are doing subdomain scanning which is not an efficient way we miss some subdomains, i have a new update which if u want u can add ...

import subprocess

PURPLE = '\033[1;30m' RED = '\033[1;31m' GREEN = '\033[1;32m' YELLOW = '\033[1;33m' BLUE = '\033[1;34m' PINK = '\033[1;35m' LBLUE = '\033[1;36m' WHITE = '\033[1;37m'

def execute_command(command): subprocess.run(command, shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

url = input(f"\n{WHITE}[!] Enter domain to Enumerate subdomains: ")

print(f"{BLUE}\n[*] Subdomain Enumeration started!\n")

print(f"{GREEN}\n[+] Enumerating subdomains from subfinder ..") execute_command(f"subfinder -d {url} -silent > sub1") print(f"{GREEN}\n[+] Enumerating subdomains from assetfinder ..") execute_command(f"assetfinder {url} > sub2") print(f"{GREEN}\n[+] Enumerating subdomains from crt.sh ..") execute_command(f"curl -s 'https://crt.sh/?q={url}' | grep '' | grep {url} | cut -d '>' -f2 | cut -d '<' -f1 | sort -u | sed '/^/d' > sub3") print(f"{GREEN}\n[+] Enumerating subdomains from rapiddns ..") execute_command(f"curl -s 'https://rapiddns.io/subdomain/{url}#result' | grep '<a' | cut -d '\"' -f 2 | grep http | cut -d '/' -f3 | sort -u > sub4") print(f"{GREEN}\n[+] Enumerating subdomains from bufferover ..") execute_command(f"curl -s 'https://dns.bufferover.run/dns?q=.{url}' | jq -r .FDNS_A[] | cut -d '\' -f2 | cut -d ',' -f2 | sort -u > sub5") print(f"{GREEN}\n[+] Enumerating subdomains from ridder ..") execute_command(f"curl -s 'https://riddler.io/search/exportcsv?q=pld:{url}' | grep -Po '(([\w.-]).([\w]).([A-z]))\w+' | sort -u > sub6") print(f"{GREEN}\n[+] Enumerating subdomains from jldc ..") execute_command(f"curl -s 'https://jldc.me/anubis/subdomains/{url}' | grep -Po '((http|https)://)?(([\w.-]).([\w]*).([A-z]))\w+' | cut -d '/' -f3 > sub7") print(f"{GREEN}\n[+] Enumerating subdomains from omnisint ..") execute_command(f"curl -s 'https://sonar.omnisint.io/subdomains/{url}' | cut -d '[' -f1 | cut -d ']' -f1 | cut -d '\"' -f 2 > sub8")

execute_command(f"sort sub1 sub2 sub3 sub4 sub5 sub6 sub7 sub8 | uniq | tee {url}-all_subdomains") execute_command("rm sub*")

print(f"{BLUE}\n[] Subdomain Enumeration Completed!\n") num = subprocess.check_output(f"wc -l {url}-all_subdomains | awk '{{print $1; exit}}'", shell=True, text=True).strip() print(f"{WHITE}\n[] Found {num} subdomains for {url}\n")

print(f"{BLUE}\n[!] view the {url}-all_subdomains file for results!\n\n")

> * Added the `--no-security-trail` option to avoid Security Trails usage for IP history discovery,
> * Added the `--no-banner` option, as this can be useful while piping the output to an external file.
> * Added the `--force` option to ignore wheter the target is using cloudflare of not.
> * Added a help page `-h/--help` using _argparse_ module.

```diff
- simply python cloakquest3r.py should show the help menu, we can also keep -h/--help

also, I didn’t like the -u URL arg, it can be there but .. it should also work the way it is now without any argument simply: python cloakquest3r.py example.com

TODO:

Some changes are still in progress and may or may be not continued based on the author's opinion on the code.

  • Adding docstrings to functions, some functions already have this included but I was too lazy to add them all.
  • Adding typing to functions. Same as above, too lazy to add all.
  • More color to printed texts. Adding colors can be an issue due to the fact a simple print function can become hard to understand in the code with the brackets everywhere.
  • Please don’t change the output structure and colour output, keep all the colours and the structure. I like colours and am obsessed with making things look pretty lol. Don’t change anything there.
  • Turn this into a PyPI package?
  • I don’t think it would be that useful to make the PyPI package !!
+ # Useful things u can add:
1. save all output in text file in the end.
2. add proxy list support, using the tool without a VPN can get ur IP banned by VIEWDNS,
because we are scrapping data from the site which is not good practice,
adding proxy support would be good improvement.

Final Considerations

My only objectives is to help and collaborate to the code somehow while improving my coding skills, I don't expect those changes to be accepted without complaints. I'm willing to listen and react to feedback from the code author.

The refactored code is available here: https://github.com/JLowborn/CloakquesterRedone

+ I am open to merging after these changes,
just make sure you don’t change the tool output look,
the tool should also work the way it is currently
I mean it should be able to also run without all the new arguments u added, 
those arguments are very nice if wanna do something specific but I also like to keep things simple.
With this in mind, feel free to make changes as you see fit to improve the tool.
P.S. I am not a pro coder, so please don't take offense to my criticism. I just want to make sure it's how I like it.
JLowborn commented 5 months ago

About the last update, I just saw yesterday after opening this issue, I've started the refactor at least one week before.

Not offended at all, I'm glad you've took your time to read and reply to this, I'm willing to help in any improvements. 😄

The colored output is not a problem for me, I'm still thinking of a way to maintain the color scheme while facilitating the coding, until then, I think it's better to keep the colors as they were before.

The improvements are not complete, just wanted to ask your honest opinion, so thanks again for your time and patience, I'll fix/review the code ASAP.

I'll keep you updated about any updates or ideias!

thisisshubhamkumar commented 5 months ago

Looking forward to seeing the progress and staying updated on any new ideas or updates. Keep up the great work! (^^)

JLowborn commented 5 months ago

While reading the updated code, I saw you've added a feature to download a wordlist from SecLists. Is it really necessary? I mean, if the wordlist is going to be used by the tool, a better option is to add it to the wordlists folder and use it as default in case no wordlist flag has been supplied by the user. Since they're just text files, they won't consume much disk space. What's your opinion on that?

Using if "wordlist_path" not in locals(): will re-download the wordlist every time the tool is executed. Adding the downloaded wordlist to the wordlists folder will prevent the code from needing to download it. Also, using it as default in case no -w/--wordlist is supplied will also avoid the need of asking the user whether there is a custom wordlist or not. Finally, it also get rid of any connection problems during the process of download the wordlist as attempting to download it too many times can result in temporary failures.

image

By the way, I'm reversing the prints and adding the colors once more, also removed the -u/--url flag as it wasn't really necessary and it now also accepts hostnames instead of just URLs.

thisisshubhamkumar commented 5 months ago

It's a good idea to download the Wordlist from Seclist's GitHub repository because it's updated frequently. If you're using a Wordlist for subdomain scanning, it's essential to have an updated Wordlist to work with. However, it's not advisable to copy and paste the Wordlist from their repository to yours as it doesn't look professional, and the Wordlist won't receive any new updates.

I am not a fan of brute force subdomain scanning. It's not foolproof. We should keep this brute force option, but also provide better methods for subdomain scanning like this one.

import subprocess

PURPLE = '\033[1;30m'
RED = '\033[1;31m'
GREEN = '\033[1;32m'
YELLOW = '\033[1;33m'
BLUE = '\033[1;34m'
PINK = '\033[1;35m'
LBLUE = '\033[1;36m'
WHITE = '\033[1;37m'

def execute_command(command):
    subprocess.run(command, shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

url = input(f"\n{WHITE}[!] Enter domain to Enumerate subdomains: ")

print(f"{BLUE}\n[*] Subdomain Enumeration started!\n")

print(f"{GREEN}\n[+] Enumerating subdomains from subfinder ..")
execute_command(f"subfinder -d {url} -silent > sub1")
print(f"{GREEN}\n[+] Enumerating subdomains from assetfinder ..")
execute_command(f"assetfinder {url} > sub2")
print(f"{GREEN}\n[+] Enumerating subdomains from crt.sh ..")
execute_command(f"curl -s 'https://crt.sh/?q={url}' | grep '<TD>' | grep {url} | cut -d '>' -f2 | cut -d '<' -f1 | sort -u | sed '/^*/d' > sub3")
print(f"{GREEN}\n[+] Enumerating subdomains from rapiddns ..")
execute_command(f"curl -s 'https://rapiddns.io/subdomain/{url}#result' | grep '<td><a' | cut -d '\"' -f 2 | grep http | cut -d '/' -f3 | sort -u > sub4")
print(f"{GREEN}\n[+] Enumerating subdomains from bufferover ..")
execute_command(f"curl -s 'https://dns.bufferover.run/dns?q=.{url}' | jq -r .FDNS_A[] | cut -d '\\' -f2 | cut -d ',' -f2 |  sort -u > sub5")
print(f"{GREEN}\n[+] Enumerating subdomains from ridder ..")
execute_command(f"curl -s 'https://riddler.io/search/exportcsv?q=pld:{url}' | grep -Po '(([\w.-]*)\.([\w]*)\.([A-z]))\w+' | sort -u > sub6")
print(f"{GREEN}\n[+] Enumerating subdomains from jldc ..")
execute_command(f"curl -s 'https://jldc.me/anubis/subdomains/{url}' | grep -Po '((http|https)://)?(([\w.-]*)\.([\w]*)\.([A-z]))\w+' | cut -d '/' -f3 > sub7")
print(f"{GREEN}\n[+] Enumerating subdomains from omnisint ..")
execute_command(f"curl -s 'https://sonar.omnisint.io/subdomains/{url}' | cut -d '[' -f1 | cut -d ']' -f1 | cut -d '\"' -f 2 > sub8")

execute_command(f"sort sub1 sub2 sub3 sub4 sub5 sub6 sub7 sub8 | uniq | tee {url}-all_subdomains")
execute_command("rm sub*")

print(f"{BLUE}\n[*] Subdomain Enumeration Completed!\n")
num = subprocess.check_output(f"wc -l {url}-all_subdomains | awk '{{print $1; exit}}'", shell=True, text=True).strip()
print(f"{WHITE}\n[*] Found {num} subdomains for {url}\n")

print(f"{BLUE}\n[!] view the {url}-all_subdomains file for results!\n\n")
JLowborn commented 5 months ago

"It's a good idea to download the Wordlist from Seclist's GitHub repository because it's updated frequently."

The last update on that wordlist that's being downloaded was 4 years ago.

"it's not advisable to copy and paste the Wordlist from their repository to yours as it doesn't look professional."

SecLists itself is a compilation of popular wordlists, it's a common thing to share resources in cybersecurity, even SecLists itself has copies of other tool's wordlists, such as JohnTheRipper (those are not being updated neither).

wordlists

Having an "updated" wordlists consists of constantly checking which are the most common and popular subdomain names, still, it doesn't have to be done every single time, it can be done once upon a semester or a year. Besides, the best practice in real world scenarios is studying a target since you won't find, for example, a booking subdomain if you're scanning an e-commerce.

We should keep this brute force option, but also provide better methods for subdomain scanning like this one.

I think you misunderstood me, I don't think bruteforcing is a bad ideia. When I've mentioned early I've implemented a --no-bruteforce option, the reason was based on previous experiences using different tools that had a slow performance because of bruteforcing, as it wasn't optional, every time I used the tool, I had to accept the fact that it was going to take a really long time before I had results.

I'll make sure to implement this solution as well.

thisisshubhamkumar commented 5 months ago

"Has it really been four years? 😂 I can't believe I never noticed the last commit. Anyway, you can proceed with your preferred method. You can keep multiple wordlists in the repository, with the default being the small-wordlist. I recommend using 5000 as the default. I tested the subdomain scan code a while back and found that it sometimes breaks if the wordlist is too long. But anything under 5k should work just fine."

JLowborn commented 5 months ago

I've added the wordlist to the folder, and also splitted this wordlist in 5 other files with different line counts.

image

thisisshubhamkumar commented 5 months ago

Make sure you test those wordlists which are bigger than 5000, as I mentioned earlier it sometimes breaks if the wordlist is too long, also do we need this many wordlists? we should just keep 3

  1. Default one which is already in repo
  2. 5000 from Seclist
  3. Biggest one
JLowborn commented 5 months ago

We should just keep 3

  1. Default one which is already in repo
  2. 5000 from Seclist
  3. Biggest one

It's okay, that's a good ideia!