zemirco / json2csv

Convert json to csv with column titles
http://zemirco.github.io/json2csv
MIT License
2.72k stars 362 forks source link

CSV created with json2csv is 3x larger than .xlsx created with ExcelJS #527

Closed zlibmaster closed 3 years ago

zlibmaster commented 3 years ago

Hi, while debugging #526 I noticed some size differences with using this library. Specificaly a much bigger file downloaded compared to the file downloaded when using ExcelJS. (https://github.com/exceljs/exceljs#writing-xlsx) While the csv is generated much faster (4x faster), it is also 3x larger (12MB .xlsx with ExcelJS, 36MB .csv with json2csv) Both Are being created with stream and piped to express response. The data in the files seems identical when opened with "MS Excel" Can someone help me lower the size of my csv? Is this something to do with encoding? Or gzip in express? I thought csv should be lighter and faster than xlsx...

json2csv version: 5.0.6 exceljs: 4.2.1 node: 14.14.0 code:

json2CSV:

  const input = new Readable({ objectMode: true });
  input._read = () => {};
  const opts = { fields, withBOM: true };
  const transformOpts = {objectMode: true};
  const json2csv = new Json2Csv(opts, transformOpts);
  input.pipe(json2csv).pipe(res);  // express response
  while (await cursor.hasNext()) {
    const dataRow = await cursor.next();
    input.push(dataRow));
  }
  input.push(null);

ExcelJS:

  const workbook = new ExcelJS.stream.xlsx.WorkbookWriter({ stream: res, useStyles: true });
  const worksheet = workbook.addWorksheet('results');
  worksheet.columns = excelColumns;

  while (await cursor.hasNext()) {
    const dataRow = await cursor.next();
    worksheet.addRow(row).commit());
  }

  workbook.commit();

Express server configuration:

server.use(bodyParser.urlencoded({extended: false}));
server.use(bodyParser.json());
server.use(helmet());
server.use(compression());

The request to the route is done from browser with a GET request using fetch, converting the response to blob, and using window.URL.createObjectURL(file);

juanjoDiaz commented 3 years ago

CSV is a plain text format. XLSX is a binary which is compressed and include optimizations like storing data in a lookup table to avoid repeating data.

So it's perfectly expected that the result is bigger in CSV. On the other hand, CSV is readable without any especial software and XLSX requires Excel or some other to decode the file.

A quick google about this will give you a lot of information. :)