Open hmalphettes opened 8 years ago
Having a function to edit the headers seems overly complex... you could just pipe the stream of objects into a transform that rewrites the keys or removes keys you don't want before sending them to csv-write-stream.
@slang800 ok, that works too.
I would argue that no matter what that transform stream will cause some overhead. And such a transform stream is almost as complex as the csv-write-stream
If one wants to optimise such a transform stream one would need to redo the same type of work than what the csv-write-stream compiler is doing:
So at this point all that is left is the csv formatting and one would end-up with csv-write-stream
Hence I think there is value in handling this transform in the csv-write-stream itself.
It's not complex and you shouldn't need to extract the headers or redo any work. It should look something like this:
var map = require('through2')
var transform = map({objectMode: true}, function (obj, enc, cb) {
var i, key, len, newKey, keys
delete obj.skipthisone
keys = Object.keys(obj)
for (i = 0, len = keys.length; i < len; i++) {
key = keys[i]
newKey = key.substring(0, 1).toUpperCase() + key.substring(1)
obj[newKey] = obj[key]
delete obj[key]
}
cb(null, obj)
})
...Unless I'm totally misunderstanding what you want to do.
Thanks @slang800
The csv-write-stream could have been written in the same fashion than what you suggest:
this.push(Object.keys(record).map(k => {
const value = record[k];
if (typeof value === 'string') {
return '"' + value.replace(/"/g, '""') + '"';
}
return value;
}).join() + '\n');
Instead it produces a block of javascript code in the _compile
method.
That is benchmarked to be much faster than making V8 discover the object it transforms with Object.keys
or any other for-loop on every row.
For example this is what it generates for {hello: "world", foo: "bar", baz: "taco"}
:
function toRow(obj) {
var a0 = obj.hello == null ? "" : obj.hello
var a1 = obj.foo == null ? "" : obj.foo
var a2 = obj.baz == null ? "" : obj.baz
var result = (/[,\r\n"]/.test(a0) ? esc(a0+"") : a0)+","+(/[,\r\n"]/.test(a1) ? esc(a1+"") : a1)+","+(/[,\r\n"]/.test(a2) ? esc(a2+"") : a2)
return result +"\n"
}
So I am trying to make sure that transforming the headers and skipping columns wont be a perf bottleneck by incorporating that into the writer.
This PR does that for an extra O(1) and adds zero overhead to the serialisation itself.
Setting the mapHeaders function in the options will skip some columns and edit the printed headers line.
For example:
This the reverse of csv-parser's
mapHeaders
: https://github.com/mafintosh/csv-parser/pull/54