Open kareemrt opened 5 months ago
@kareemrt thanks for reporting!
Based on a quick test, this looks to be a multithreading issue. Perhaps not everything is declared thread local. Might be an issue with how we create curl objects. Could you try comparing the pointer of *session.GetCurlHolder()
if they are actually different.
As ref. The following works in a single threaded scenario:
TEST(SessionGetTests, GetMultipleTimes1) {
Url url{server->GetBaseUrl() + "/hello.html"};
Session session;
session.SetUrl(url);
std::string expected_text{"Hello world!"};
Response response = session.Get();
EXPECT_EQ(expected_text, response.text);
EXPECT_EQ(url, response.url);
EXPECT_EQ(std::string{"text/html"}, response.header["content-type"]);
EXPECT_EQ(200, response.status_code);
EXPECT_EQ(ErrorCode::OK, response.error.code);
Url url2{server->GetBaseUrl() + "/url_post.html"};
session.SetUrl(url2);
session.SetPayload({{"x", "5"}});
std::string expected_text2{
"{\n"
" \"x\": 5\n"
"}"};
response = session.Post();
EXPECT_EQ(expected_text2, response.text);
EXPECT_EQ(url2, response.url);
EXPECT_EQ(std::string{"application/json"}, response.header["content-type"]);
EXPECT_EQ(201, response.status_code);
EXPECT_EQ(ErrorCode::OK, response.error.code);
}
Description
I am writing a web-scraper library with libcpr that cycles random proxies and headers on a GET request. My intended behavior is for other programs to call on force_connect() to perform different GET requests, while maintaining some information between all functions calls (e.g., Proxy / browser header variables, etc.)
When I perform a single GET request to a URL (e.g., URL 1), everything works correctly; If I perform multiple GET requests to the same URL (URL 1), everything works correctly; if I perform a single GET request to a URL (URL 1), then perform another GET request to a new URL (e.g., URL 2), the second Session.Get() call returns a response from URL 1 instead of URL 2.
This behavior can be verified with the last line of code (cout << url << " " << r.url << endl;). This prints both the url passed to the function, and the url used in the GET request.
This behavior remains whether I re-use a session object, create a new session object (i.e. remove 'static'), or omit the session and use Response objects only (though under-the-hood these seem similar as Response.Get() calls on Session).
My program uses many static variables because I want to maintain allocated memory between force_connect() calls; even if I remove static calls and re-declare variables, I encounter the same issue.
There are a lot of commented out code lines; these are potential solutions I tried (and failed with).
I am unsure why I am encountering this behavior; when I print 'session.GetFullRequestUrl()', it prints the PROPER url (URL 2) which is even stranger (it means part of the session object is updating and part of it is not).
Example/How to Reproduce
string force_connect(string url, int tries){
}
Possible Fix
cpr::Session::SetUrl(const Url& url); takes a passed cpr::Url object and sets the private parameter 'url_' to the reference.
It sets correctly initially (that's how it reaches URL 1), but refuses to update when the same object pointer (or an entirely new one) is passed. Even when a new session and/or cpr::url object is created, I still encounter this behavior.
Looking into Session.Get() code, it appears the underlying call is to curl_easy_perform(), which reads the URL from a libcurl flag (curl_easy_set_opt(curl, CURLOPTURL, url.c_str())) that was set in Session::prepareCommon().
I don't know why Session.url_ is not updating; maybe it is and something is wrong in libcurl's code (I can't check using a debugger because this library is meant for my main program which was written in PYTHON, and the class member is private).
Either a modification-check or a copy-by-value approach could be potential solutions.
Where did you get it from?
Other (specify in "Additional Context/Your Environment")
Additional Context/Your Environment