Performance: instantiate row from the same base class instead of starting from generic {} object

sidorares commented 1 year ago

@testn we did have some initial discussion when the original change was made in #1402 but wanted to review and benchmark again. I think the fix should be simple, if the performance numbers are visible enough worth adding.

A bit of context: previously, generated parser class was used to create an instance of row object. V8 caches access to fields, so in theory doing

function row() { 
   this.field1 = 1; 
   this.field2 = 2; 
   // etc
}

new row() // repeat many times

is much faster then


function row() { 
   const result = {};
   result.field1 = 1; 
   result.field2 = 2;
   // etc
}

We could potentially generate some proxy RowConstructor class that assigns all fields from passed from the constructor. For example, currently generated code for connection.query("show tables) is

(function () {
  return class TextRow {
    constructor(fields) {
    }
    next(packet, fields, options) {
      this.packet = packet;
      const result = {};
      // "Tables_in_mysql": VAR_STRING
      result["Tables_in_mysql"] = packet.readLengthCodedString("utf8");
      return result;
    }
  };
})()

we could make it something like that?

(function () {
  return class TextRow {
    constructor(fields) {
       this.RowConstructor = function(field1) {
          // "Tables_in_mysql": VAR_STRING
          this["Tables_in_mysql"] = field1;
       }
    }
    next(packet, fields, options) {
      this.packet = packet;
      return new this.RowConstructor(
         packet.readLengthCodedString("utf8")
      );
    }
  };
})()

Simple benchmark shows 50-60x time difference in row construction, not sure how big the penalty is in the total mix but might be still visible ( constructing an object with 7 fields 10m times and then accessing all fields )

const parse1 = function (base) {
  const result = {};
  result.id = 1;
  result.int_0 = base + 0;
  result.int_1 = base + 1;
  result.int_2 = base + 2;
  result.int_3 = base + 3;
  result.int_4 = base + 4;
  result.int_5 = base + 5;
  result.int_6 = base + 6;
  return result;
};

const parse2 = function (base) {
  this.id = 1;
  this.int_0 = base + 0;
  this.int_1 = base + 1;
  this.int_2 = base + 2;
  this.int_3 = base + 3;
  this.int_4 = base + 4;
  this.int_5 = base + 5;
  this.int_6 = base + 6;
  return this;
};

// benchmark 2 versions: 1) parse1, 2) parse2

const NUM_ITEMS = 10000000;

const benchmark = function () {
  let results = new Array(NUM_ITEMS);

  const start = Date.now();
  let sum = 0;
  for (let i = 0; i < NUM_ITEMS; i++) {
    let result = parse1(i % 1000);
    results[i] = result;
    sum +=
      result.id +
      result.int_0 +
      result.int_1 +
      result.int_2 +
      result.int_3 +
      result.int_4 +
      result.int_5 +
      result.int_6;
  }

  const end = Date.now();
  console.log(`parse1: ${end - start} ms; sum: ${sum}`);

  results = new Array(NUM_ITEMS);
  const start2 = Date.now();
  sum = 0;
  for (let i = 0; i < NUM_ITEMS; i++) {
    let result = parse2(i % 1000);
    results[i] = result;
    sum +=
      result.id +
      result.int_0 +
      result.int_1 +
      result.int_2 +
      result.int_3 +
      result.int_4 +
      result.int_5 +
      result.int_6;    
  }

  const end2 = Date.now();
  console.log(`parse2: ${end2 - start2} ms, sum: ${sum}`);
};

for (var i = 0; i < 10; i++) {
  benchmark();
}

parse1 runs approx 1600ms while parse2 approx 30ms

sidorares commented 1 year ago

I think I'd prefer refactoring code back to reading packet directly in a generated row constructor and not calling the constructor with all fields passed as list of arguments

sidorares commented 1 year ago

Benchmarking "select 1000 rows, each row has 1 to 1000 INT columns", y axis is "columns X 1000 per millisecond", the higher the better

benchmarks

Family of 3 graphs is mysql2@2.3.0, family below is current 3.4

Interestingly the difference is more noticeable on a single cpu box ( benchmarked is a hobby railway.app server -> free planetscale database ). When I test on my local laptop the difference is less prominent BUT cpu consumption for node process is 300-400%. My theory: V8 optimiser runs in a separate thread(s) and if there is enough room does not affect main JS thread much

sidorares commented 1 year ago

I'll add a benchmark later with "interpreting" parser, e.i the one that does not have "generate a parser js code and eval it upfront" pass

sidorares commented 1 year ago

benchmarks2

Top graph ( best performance ) - mysql2@2.30 Middle - current Bottom, "static / interpreting parser" from #2099 branch. I was hoping to see an intersection and switch to a "static" parser when response contains large number of columns but it looks like current version performs better across the line ( and 2.3.0 performs even better )

sidorares / node-mysql2

Performance: instantiate row from the same base class instead of starting from generic {} object #2055