rflechner / ScrapySharp

reborn of https://bitbucket.org/rflechner/scrapysharp
MIT License
346 stars 75 forks source link

Issue with cookie path #1

Open balexandrov opened 6 years ago

balexandrov commented 6 years ago

I am trying to scrape a server that returns me multiple Set-Cookie headers and the alternative parser does not work correctly. The default one works but throws invalid cookie path because there is a cookie with a path.

This Uri construction here omits the path and supplies domain only for example https://example.com:443/ but if the cookie is with path for example /forum - cookieContainer.SetCookies throws exception with invalid path. Just passing the original url here fixed it for me. Not sure why that url construction was needed.

ScrapingBrowser.cs private async Task GetWebResponseAsync(Uri url, HttpWebRequest request)

var cookieUrl =
                        new Uri(string.Format("{0}://{1}:{2}/", response.ResponseUri.Scheme, response.ResponseUri.Host,
                                              response.ResponseUri.Port));

                    if (UseDefaultCookiesParser)
                        cookieContainer.SetCookies(url, cookiesExpression);
                    else
                        SetCookies(url, cookiesExpression);

...

Regards, Bojo

jeffmikan commented 5 years ago

I am running into this also with Set-Cookie: JSESSIONID=1DB030FB14CB6BE89638E86ACEXXXXXX.node1; Path=/iam/im; Secure

khantoocool commented 4 years ago

I have answered the same problem in this issue

toquehead commented 3 years ago

I have answered the same problem in this issue

Your "answer" is simply disabling cookie processing. That is not always a viable solution - like when you're logging in, for example.

toquehead commented 3 years ago

This appears to still be an issue 2 years later. I'm a bit perplexed as it seems like such a show stopper. What am I missing?

I would add that you can avoid the error by appending the offending cookie's path to the Uri. So with a multi-cookie path you'd need to parse the cookies and call cookieContainer.SetCookies() once for each unique path. But I can't figure out how to do with with ScrapySharp as there are no hooks into the processing of cookies.

The cookies I'm getting (Wordpress) are also formatted with date strings that include a comma, so ScrapySharp.Network.CookieParser fails to correctly parse the cookies.