maurosoria / dirsearch

Web path scanner
12.16k stars 2.31k forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 49: invalid start byte #925

Closed hcjcn closed 3 years ago

hcjcn commented 3 years ago

What is the current behavior?

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 49: invalid start byte What actually happens? An error occurred while scanning the website

What is the expected behavior?

Target: http://10.79.7.248:9999/

[12:39:32] Starting: Traceback (most recent call last): File "dirsearch.py", line 47, in main = Program() File "dirsearch.py", line 43, in init self.controller = Controller(self.script_path, self.arguments, self.output) File "D:\Tools\scanner\dirsearch-0.4.2-beta1_Fscan_awvs\lib\controller\controller.py", line 241, in init self.prepare() File "D:\Tools\scanner\dirsearch-0.4.2-beta1_Fscan_awvs\lib\controller\controller.py", line 578, in prepare self.fuzzer.start() File "D:\Tools\scanner\dirsearch-0.4.2-beta1_Fscan_awvs\lib\core\fuzzer.py", line 147, in start self.setup_scanners() File "D:\Tools\scanner\dirsearch-0.4.2-beta1_Fscan_awvs\lib\core\fuzzer.py", line 88, in setup_scanners self.default_scanner = Scanner(self.requester) File "D:\Tools\scanner\dirsearch-0.4.2-beta1_Fscan_awvs\lib\core\scanner.py", line 42, in init self.setup() File "D:\Tools\scanner\dirsearch-0.4.2-beta1_Fscan_awvs\lib\core\scanner.py", line 120, in setup if first_path in first_response.body.decode(): UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 49: invalid start byte

Any additional information?

OS, python version, screenshots, dirsearch command, console output, ...? OS : Microsoft Windows 10 pro 10.0.18362 Python: Python 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)] :: Anaconda, Inc. on win32 dirsearch:dirsearch-0.4.2-beta1 python dirsearch.py -l url.txt -t 100 -i 200 -e * -r R 3 --exclude-sizes 0B --exclude-texts "403 Forbidden" --exclude-regexps "^Error$" -w db\aaa.txt --format=html

Checker:

maurosoria commented 3 years ago

Hello @hcjcn,

Can you provide us with more information? like the html encoding and if you see any weird characters at http://10.79.7.248:9999/.

Thanks for reporting!

shelld3v commented 3 years ago

@maurosoria I have a linked PR to fix this issue. This bug is caused due to a character not in UTF-8 exists in the response body, that we decode it in lib/core/scanner.py