hyperledger-archives / caliper

A blockchain benchmark framework to measure performance of multiple blockchain solutions
Apache License 2.0
74 stars 24 forks source link

add ability to run duration based tests and trim initial results #48

Closed nklincoln closed 6 years ago

nklincoln commented 6 years ago

The current performance test configuration is set to run a sequence of tests based on a desired number transactions to executed, at a given transaction rate from the perspective of the fabric platform (and not the client running the test).

This pull request is aimed at assisting the running of time based performance tests, where the desire is to run a test for a duration at a specific TPS, and to remove an initial set of results so that they are not included in the performance statistics (this is useful to omit results that are generated during a 'warm-up' phase and concentrate on the 'under load' portion of the test phase).

A new test tag of 'durationTpsAndTrim' is used to indicate the desire to run a time based test instead of a target number of transactions.

The PR includes:

haojun commented 6 years ago

Thanks @nklincoln . Some questions:

  1. You calculate the total number as testRounds[i][0]testRounds[i][1]clientNum. Why multiply by the clientNum?
  2. 'client.number' is only available if the client type property is set to 'local'. A new type named 'zookeeper' was introduced recently to support run tests on distributed machines. In that case, 'client.number' would be undefined.
  3. Not sure how trimming would affect the calculation of throughput. For example, if the first 100 txs are ignored , the throughput will be calculated as (end_time - start_time) / (total number - 100) where start_time is when tx101 is submitted(created). It does not reflect the total work of DLT if some of the first 100 transactions are actually committed after the start_time.
nklincoln commented 6 years ago

Sorry for my delay in responding.

The client number was required to cancel out the division by client num in local-client -> this is due to the fact that the driving metric is inbound transactions hitting the blockchain and a desire to restrict the number from all clients to match that specified within the test json folder, rather than being multiplicative for additional clients.

There is likely a better way to specify a timing based test, that is compatible with the zookeeper test route ... I will confess that I have only been using the local-client tests.

With the trimming , I have been unable to spot in the code where the global start time is being used. The results being trimmed each contain their own complete timing information - I was removing the initial set to get a better understanding of the performance under stress and not during a warm up phase. In reality, the end of the test run should also be ignored.

haojun commented 6 years ago
  1. Sorry I could not get all your point about the first question due to my English skill...... As you mentioned, the old number is defined as the total transactions that would be submitted no matter how many clients are there. And your new number seems to define the transactions each client would submit, right? Why not define in the same way as the old one?

  2. Trimming is a good idea, but i think a better way is to keep the original results and modify the measurement method to apply trimming.

Anyway, the idea is great, but the PR should at least be compatible with zookeeper test. And I suggest to divide it into two features, one for time based test and another for trimming.

nklincoln commented 6 years ago

Hi, Thanks for the reply. I agree there should be two features: one for time based runs and another for trimming the warm-up and cool down phases. I've rebased on the most recent changes for the addition of iroha and split into two options.

I have modified the PR to include:

I believe that the duration based runs are now compatible with the zookeeper tests.

I quite like the idea of retaining all results and accounting for the a trimming operation in a later stage, though there could be issues when the results from multiple clients are merged - happy to continue a discussion on this aspect 👍

haojun commented 6 years ago

Hi @nklincoln , I've tested the code, and found some problem.

  1. The actual test duration is much longer than the set time. I think the problem is the txPerClient should be calculated based on the tps per client instead of the total tps
  2. When applying trim with txNumbAndTps, the trim number should also be divided by the number of clients
nklincoln commented 6 years ago

Thanks for checking again - I have updated the code to condition for client numbers within the duration based runs and in the trim process