sjdirect / abot

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Apache License 2.0
2.25k stars 560 forks source link

User Agent config value appears to be getting split on spaces and sending requests with multiple user-agent headers #217

Closed entryspace closed 4 years ago

entryspace commented 4 years ago

I was debugging an implementation using Abot2 because I was seeing a lot of 500 errors. To try and get more visibility into the problem, I started outputting the request that was sent be my crawler, where I discovered that the user agent string I had set in the configuration, and which contained two spaces, had been split into three different user-agent headers in every request that was being sent.

As a temporary workaround, I've just removed the spaces for now, but that won't work if trying to match the user-agent of an actual browser.

sjdirect commented 4 years ago

Just testing this and am not able to replicate. I see this being sent for my test...

GET http://yahoo.com/ HTTP/1.1 User-Agent: Test 1 Test 2 Test 3 Test 4 Accept: / Host: yahoo.com

What platform are you running on? Possible you have some line break symbols in there somehow? Can you isolate this down to a unit test?

sjdirect commented 4 years ago

Unable to repro. Closing unless a reproduce-able example is given