maurosoria / dirsearch

Web path scanner
11.57k stars 2.29k forks source link

LookupError(unknown encoding) raised when decoding response #1380

Open HSwift opened 1 week ago

HSwift commented 1 week ago

What is the current behavior?

I found that some websites have non-standard values in the content-type charset field in their responses, resulting in decoding errors.

for example:

< HTTP/2 200
< date: Wed, 26 Jun 2024 14:50:49 GMT
< content-type: text/html; charset=utf-8,gbk
< content-length: 455

The incorrect charset will raise a LookupError in the 'lib/connection/response. py' file, leading to abnormal exit.

https://github.com/maurosoria/dirsearch/blob/0ad2b8f4cfc31dfa206d3fa6e44dc1ee06c7f10e/lib/connection/response.py#L47-L50

What is the expected behavior?

I suggest handling the exception when decoding fails and using default encoding for decoding.

if not is_binary(self.body):
    try:
        self.content = self.body.decode(
            response.encoding or DEFAULT_ENCODING, errors="ignore"
        )
    except LookupError:
        self.content = self.body.decode(
            DEFAULT_ENCODING, errors="ignore"
        )