Closed mcopik closed 3 years ago
Could you try adding "--v=1" to the command lines of gateway, engine, and launcher, which enables verbose logging?
Verbose logging might provide some information into which contributes to 100ms latency.
@zhipeng-jia Thanks for a swift reply!
(edited)
I noticed that the default run_stack.sh
script already had the verbose logging enabled, and I also found a mistake in interpreting our results. After disabling logging, the latency of submitting an invocation with just a few bytes dropped to ~60-90us (not milliseconds) - is that what you would expect? It looks correct and aligns with the number, given the close proximity of the servers.
However, when sending an input JSON with ca. 1 kB, the latency is still around one second. This seems high when compared to OpenWhisk (~100-200ms even on 100kB input).
Here's the test code we use; anything particular that might cause unnecessary slowdown on your gateway?
static size_t WriteCallback(void *contents, size_t size, size_t nmemb, void *userp)
{
((std::string*)userp)->append((char*)contents, size * nmemb);
return size * nmemb;
}
// part of test routine:
std::string readBuffer;
// generate input of given size
std::string out = base64_encode(test);
curl = curl_easy_init();
std::string post = "{\"in\": \"" + out + "\"}";
int len;
char * str = curl_easy_unescape(curl, url.c_str(), url.length(), &len);
curl_easy_setopt(curl, CURLOPT_URL, str);
struct curl_slist *list = NULL;
list = curl_slist_append(list, "Content-Type: application/json");
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, list);
curl_easy_setopt(curl, CURLOPT_POST, 1);
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, post.c_str());
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
for(int i = 0; i < repetitions + 1; ++i) {
readBuffer.clear();
auto start = std::chrono::high_resolution_clock::now();
res = curl_easy_perform(curl);
auto end = std::chrono::high_resolution_clock::now();
if(i > 0)
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() / 1000.0 << '\n';
}
Please find attached the logs for 10 executions with verbose logging (again, latency ~1s).
input_1.txt engine.log Foo_worker_0.stderr.log gateway.log launcher_foo.log
Thanks for the log. It provides some useful information.
Looks like the problem is here:
I0523 14:00:55.934223 IO-0/EL http_connection.cpp:283] HttpConnection[0]: Parse new HTTP header: Expect = 100-continue
I0523 14:00:56.935250 IO-0/EL http_connection.cpp:254] HttpConnection[0]: Start parsing URL: /function/Foo
These two lines have a duration of 1 second. The first line is the moment of parsing the last HTTP header, while the second line is the moment that the full HTTP request is received (see https://github.com/ut-osa/nightcore/blob/asplos-release/src/gateway/http_connection.cpp#L254). When the full request is received, the gateway then sends the function call to engine.
I am not sure what causes here to consume 1 second. I am also not familiar with curl's C APIs, but I do not see obvious problems from the client code that you show.
After searching around, I guess the "Expect: 100-continue" (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Expect) may cause the problem. I do not implement the required logic for handling expect header, i.e. sending response before receiving HTTP body.
Here is what I found how to disable it for libcurl: https://stackoverflow.com/questions/49670008/how-to-disable-expect-100-continue-in-libcurl
@zhipeng-jia Thanks for your help! Now it seems to work. We get ~150 us for 1 kB messages and 20ms for 5 MB messages.
I'm closing the issue as this has been resolved.
Good to know!
@zhipeng-jia Hi! I have one more question - I'm trying to measure the speedups achieved by integrating nighcore. I want to dispatch between 8 and 32 functions to the same instance.
Besides changing the JSON configuration file with the minimal and maximal number of workers, is there anything else that I should pay attention to? I'm going to run C/C++ functions.
If you're going to measure under a fixed concurrency, setting both max_worker and min_worker to the desired concurrency will work.
@zhipeng-jia Thanks so much! And just to verify: are the POST requests the only way of invoking nightcore functions externally? I'm asking because we have to transmit some floating-point numbers, and maybe there's a way of skipping float->string->float serializations.
I have also implemented gRPC transport, but needs more setup and not tested extensively.
If it does not cause real problems, I'd suggest use HTTP POST.
Hi!
I've been trying to execute simple functions with
nightcore
. To that end, I followed the installation instructions, and I modified the example of a function to even a simpler one:When deploying the controller and engine on one server, and using curl to send POST requests from another server, I measure latencies around ~100ms. The TCP/IP RTT between both machines is ~12us. When I increase the size of input JSON, the RTT quickly increases to over a second for an invocation on a 1 kB message. On the same cluster of machines, we've been able to deploy OpenWhisk and execute functions with a 1 kB payload with a latency of around ~100ms.
Is there are any other way to invoke nightcore functions that don't involve such high latencies? Is there a potential problem in our deployment that might cause such high latencies? Is there documentation on optimizing deployments?