max-mapper / csv-write-stream

A CSV encoder stream that produces properly escaped CSVs
BSD 2-Clause "Simplified" License
204 stars 37 forks source link

Support for mapHeaders to filter columns and edit the printed headers #28

Open hmalphettes opened 8 years ago

hmalphettes commented 8 years ago

Setting the mapHeaders function in the options will skip some columns and edit the printed headers line.

For example:

var writer = csv({
  mapHeaders: function (header) {
    return header === 'skipthisone' ? null
    : header.substring(0,1).toUpperCase() + header.substring(1)
  }
})

This the reverse of csv-parser's mapHeaders: https://github.com/mafintosh/csv-parser/pull/54

notslang commented 8 years ago

Having a function to edit the headers seems overly complex... you could just pipe the stream of objects into a transform that rewrites the keys or removes keys you don't want before sending them to csv-write-stream.

hmalphettes commented 8 years ago

@slang800 ok, that works too.

I would argue that no matter what that transform stream will cause some overhead. And such a transform stream is almost as complex as the csv-write-stream

If one wants to optimise such a transform stream one would need to redo the same type of work than what the csv-write-stream compiler is doing:

So at this point all that is left is the csv formatting and one would end-up with csv-write-stream

Hence I think there is value in handling this transform in the csv-write-stream itself.

notslang commented 8 years ago

It's not complex and you shouldn't need to extract the headers or redo any work. It should look something like this:

var map = require('through2')
var transform = map({objectMode: true}, function (obj, enc, cb) {
  var i, key, len, newKey, keys
  delete obj.skipthisone
  keys = Object.keys(obj)
  for (i = 0, len = keys.length; i < len; i++) {
    key = keys[i]
    newKey = key.substring(0, 1).toUpperCase() + key.substring(1)
    obj[newKey] = obj[key]
    delete obj[key]
  }
  cb(null, obj)
})

...Unless I'm totally misunderstanding what you want to do.

hmalphettes commented 8 years ago

Thanks @slang800

The csv-write-stream could have been written in the same fashion than what you suggest:

      this.push(Object.keys(record).map(k => {
        const value = record[k];
        if (typeof value === 'string') {
          return '"' + value.replace(/"/g, '""') + '"';
        }

        return value;
      }).join() + '\n');

Instead it produces a block of javascript code in the _compile method.

That is benchmarked to be much faster than making V8 discover the object it transforms with Object.keys or any other for-loop on every row.

For example this is what it generates for {hello: "world", foo: "bar", baz: "taco"}:

function toRow(obj) {
var a0 = obj.hello == null ? "" : obj.hello
var a1 = obj.foo == null ? "" : obj.foo
var a2 = obj.baz == null ? "" : obj.baz
var result = (/[,\r\n"]/.test(a0) ? esc(a0+"") : a0)+","+(/[,\r\n"]/.test(a1) ? esc(a1+"") : a1)+","+(/[,\r\n"]/.test(a2) ? esc(a2+"") : a2)
return result +"\n"
}

So I am trying to make sure that transforming the headers and skipping columns wont be a perf bottleneck by incorporating that into the writer.

This PR does that for an extra O(1) and adds zero overhead to the serialisation itself.