discussion - Githubissues

tatumizer commented 5 years ago

Opened to move the discussion from dart language. Will post further comments here.

EDIT by @thomasio101
The original discussion can be found here.

tatumizer commented 5 years ago

To make the framework generic, we need to come up with the idea what "generic test" looks like. Here's one example:

int testListCopy(int iterations, int nChunks, int chunkSize) {
  var list= new List.filled(chunkSize, 0);
  int x = 0;
  for (int i=0; i<iterations; i++) {
    for (int j=0; j<nChunks; j++) {
      var list1 = List.from(list);
      x+=list1[0];
    }  
  } 
  return x;
}

So the total number of "elementary" operations is iterations*nChunks*chunkSize Main program never calls the test directly - instead, it calls a function runTest(testListCopy, "List Copy", some other parameters). runTest is responsible for everything else - warmup, calculating parameters of invocation and other stuff. There are a number of things to be taken into account, including the overhead of GC, which might be in play during tests, but not in real-life programming (at least, not to the same extent). Have to compute this overhead and make adjustments.

You know why there's no place on internet where you can see the results of micro-benchmarks for different programming languages? Because it's hard to design and implement these tests correctly! LOL But doable. If you are interested, we can do it together.

(I'm retired, and couldn't find much motivation to do it all by myself)

Will write more tomorrow. Cheers! Alex

thomasio101 commented 5 years ago

@tatumizer Alright, I'm going to start reading through this now!

thomasio101 commented 5 years ago

@tatumizer to begin with, I've written a class called Test, and a new branch.

tatumizer commented 5 years ago

No rush. First, I'd like to explain the situation. These tests will be meaningful only after they release AOT version of dart later this year. Currently, you are testing VM, which is used only while debugging - dart team doesn't care about it too much (at least for now). With AOT for desktop, things will get much more interesting. Flutter has AOT now, but I never touched Flutter. These tests might be helpful while comparing Flutter AOT with java performance on Android, and with Swift or Objective C on iOS. Currently, dart team is working on FFI interface with C, which will allow them to optimize AOT runtime. We will see it later this year (hopefully). and only then performance comparison would make sense. If we design the framework well, it would be easy to add as many tests as necessary, very quickly. But the tests, as you can see, are not 1-liners. It's hardly possible to create a UI which tests performance based on a simple input like "I want to see the performance of Map.put". Every test should be written by a human who understands the framework, but as you see, these tests are still quite small. (To be continued).

tatumizer commented 5 years ago

I don't know what your background is, but for this project, it's important to have an idea of how modern processors work - pipelines, branch prediction, timing of basic instructions etc. If you are interested, I can find some materials for you. Why is this important? Because first off, you need to be able to predict, at least roughly, the performance of each function under test based on common sense. E.g. when you posted the timing of a simple function call to be 0.03 microseconds (which is 30 nanoseconds), I knew right away that it cannot be true. What is function call? You push a couple of operands into the stack, invoke processor instruction "call", then take the operands from the stack, add them and return. When the clock rate is 3GHz, 30 microseconds translate to almost 100 processor cycles, but each of the above operations take just 1-4 cycles. Also, some instructions are executed in parallel - the pipeline can execute 3 or more instructions per cycle. So it can all add up to, say, 10-15 cycles, but not 100. Comparing with common sense is how you know your test is not off by a mile.

tatumizer commented 5 years ago

Explanation of testListCopy:

int testListCopy(int iterations, int nChunks, int chunkSize) {
  var list= new List.filled(chunkSize, 0);
  int x = 0;
  for (int i=0; i<iterations; i++) {
    for (int j=0; j<nChunks; j++) {
      var list1 = List.from(list);
      x+=list1[0];
    }  
  } 
  return x;
}

Q: Why do we need to test different chunk sizes ? A: Copying the list includes a constant overhead + copying each item. This constant overhead is not negligible. So the total timing is the sum: T0 + k*T1, where k - the number of items. In reality, it's not linear like this b/c of cache effects etc, but for a normal range of list sizes, it's a good approximation. (Certainly, when you cross some boundary for list size, performance degrades sharply, but we don't want to test these sizes).

Q: who calls testListCopy with different chunk sizes? A: function "runTest" is a test runner. After warming up the function under test, it calls it with different chunk sizes, then computes T0, T1 and k based on timing data. So "runTest" treats the function under test as a black box.

Q: how many iterations are needed? A: we can say that each test should run for, say, 1 second. "runTest" is trying to predict the number of iterations that fit in 1 sec by probing the performance of the function under test on small number of iterations.

Q: what about the effects of Garbage Collector (GC)? A: In the inner loop of testListCopy, we create a new list. The interesting question is: is the compiler smart enough to see that this list never escapes the context, and can be garbage collected immediately? If yes - all is good. If not, then our test will basically turn into the test of GC :). We have to ask VM guys about expected behavior of GC. If the compiler is not smart enough, we have to account for the effects of GC ourselves in runTest (which is doable too)

So you see that runTest has quite a bit of functionality in it.

thomasio101 commented 5 years ago

@tatumizer I was in the middle of typing a response :D.

thomasio101 commented 5 years ago

@tatumizer This is all very interesting. I think you're trying to say that our framework should take in black boxes and then pass an object to them that acts as an interface that can receive data points. Is this correct?

thomasio101 commented 5 years ago

While we're trying to figure this out, I'm going to continue work on some standardized means of storing and visualizing the results, okay?

EDIT I'm currently getting to work on writing a way to properly store the data, with different axis, also being able to define units and to add standardized labels for generating things like graphs and tables.

tatumizer commented 5 years ago

Of course you are free to do anything with your time :) I just wanted to explain what is necessary for the project to be useful. Before visualizing things, we need to figure out what these things are. Otherwise somebody will look into your code and say: sorry, this test doesn't really tell me anything. No matter how many pictures are in it :) If you are not interested, then it's fine too! LOL

thomasio101 commented 5 years ago

I'm interested, @tatumizer! But I'm trying to get some concrete results, and as you've said, the current situation means that performance tests aren't relevant (For now at least). That's why I'm focusing on the framework.

Anyways, do you have any ideas for that?

EDIT And yes, figuring out what we are trying to store and visualize should be our first priority (or so I think).

thomasio101 commented 5 years ago

Alright, I've read your comments in detail, and I think what we need to visualize is the following (I've also added some of my own considerations)

The progression in the run time per iteration, mapped to the number of iterations.
Multiple results on the y-axis per point on the x-axis, to visualize the error margin.
A way of comparing different result sets, things like the run time of Function.apply versus the verbose method for example.
??? A way of showing data like the CPU core temperature for each data point, to ensure data sets aren't skewed by factors such as CPU throttling ???

tatumizer commented 5 years ago

Sure, please try concrete results (the notion I don't completely understand in this context, but that's fine), and then come back. :)

thomasio101 commented 5 years ago

Good! I'll respond here when I'm done! (Will take a week or so.)

tatumizer commented 5 years ago

How to visualize - is not that important right now. There can be 100 ways of visualizing things. Function.apply is not very interesting - everybody knows it's slow, people don't use it in realtime apps. The only interesting thing is how dart ranks against java, objective C, go in terms of performance of frequently used operations (lists, maps, strings, etc). 90% of overall performance depends on these small mundane operations. Most users have never heard of Function.apply. BTW, just curious - why do you call direct invocation "verbose"? :)

tatumizer commented 5 years ago

Good! I'll respond here when I'm done! (Will take a week or so.)

OK! Good luck!

thomasio101 commented 5 years ago

Oh @tatumizer, I call it 'verbose' because you're writing out exactly what should be done, while Function.apply abstracts away accessing map and list elements.

tatumizer commented 5 years ago

verbose = using or expressed in more words than are needed. In this case, it's "direct" invocation, not verbose. :)

thomasio101 / dart-performance-testing

discussion #1