raviqqe / muffet

Fast website link checker in Go
MIT License
2.5k stars 97 forks source link

Unable to handle colossal sized response headers #388

Open craigreyenga opened 3 months ago

craigreyenga commented 3 months ago

My website has a link to TikTok on it. The HTTP response headers from TikTok are very big, which causes Muffet to fail.

    error when reading response headers: small read buffer. Increase ReadBufferSize. Buffer size=4096, contents: "HTTP/1.1 302 Moved Temporarily\r\nContent-Type: text/html; charset=utf-8\r\nContent-Length: 47\r\nContent-Security-Policy: script-src 'unsafe-eval' sf16-website-login.neutral.ttwstatic.com s20.tiktokcdn.com"..."tiktok-row.net *.tiktok.com *.tiktok.ru *.tiktok.vn *.tiktokapis.com *.tiktokcdn-eu.com *.tiktokcdn-in.com *.tiktokcdn-us.com *.tiktokcdn.com *.tiktokcreativeone.com *.tiktokforbusinessoutbound.com *."  https://www.tiktok.com/

Pointing to the following HTML is enough to trigger the issue:

<!DOCTYPE html>
<html lang="en">
<head></head>
<body>
    <a href="https://www.tiktok.com/">TikTok</a>
</body>
</html>

Using curl -vvvv https://www.tiktok.com/ shows that the response headers are very large on TikTok.

raviqqe commented 3 months ago

Did you try the buffer size option?

craigreyenga commented 3 months ago

-b 8192: bad -b 9216: good

I did not notice this option when I first started using muffet. Does this option refer to the total size of all HTTP headers in the response? I assume the body of the response is not a part of this buffer size.

Would it be possible to increase the default to 10 kB?

raviqqe commented 3 months ago

The buffer size is used for every connection. So it's not something easy to modify as it's critical to the entire process's memory usage... 🤔

craigreyenga commented 3 months ago

The buffer size is used for every connection. So it's not something easy to modify as it's critical to the entire process's memory usage... 🤔

Does that memory usage accumulate as more and more requests are made?

Anyway, I will defer to your judgment. It's not a huge problem for me to add that parameter to each invocation.