hildjj / node-cbor

Encode and decode CBOR documents, with both easy mode, streaming mode, and SAX-style evented mode.
MIT License
356 stars 73 forks source link

Encoding a "large" 5MB JSON Object #184

Closed kion-dgl closed 1 year ago

kion-dgl commented 1 year ago

I wanted to try using CBOR to compress a large object. Specifically I used https://github.com/KhronosGroup/glTF-Sample-Models/blob/master/2.0/DamagedHelmet/glTF-Embedded/DamagedHelmet.gltf from this threejs example: https://threejs.org/examples/?q=gltf#webgl_loader_gltf.

Since the JSON contains a huge base64 buffer I wanted to see if CBOR could be used to effectively encode the data internally. As a start I wanted to see if the file could be encoded at all with the library.

const cbor = require('cbor')
const { readFileSync } = require('fs')

const gltf = readFileSync('DamagedHelmet.gltf', 'utf8');
console.log(gltf.length); // 5.0MB

const src = JSON.parse(gltf);
const encoded = cbor.encode(src);
console.log(encoded.length); // 745 KB

cbor.decodeFirst(encoded, (err, obj) => {
    if(err) throw err;
    console.log(obj);
});

The result is that the encoded CBOR object is only about 745 KB from a 5MB object. I check and compressed the file should be about 3.6MB. So 745 KB is too small. And then the encoded buffer fails with the following error:

node:internal/process/promises:279
            triggerUncaughtException(err, true /* fromPromise */);
            ^

Error: unexpected end of input
    at Decoder._flush (/home/kion/Documents/CBOR-MOdel/node_modules/cbor/vendor/binary-parse-stream/index.js:106:30)
    at Decoder.final [as _final] (node:internal/streams/transform:133:25)
    at callFinal (node:internal/streams/writable:696:27)
    at prefinish (node:internal/streams/writable:725:7)
    at finishMaybe (node:internal/streams/writable:735:5)
    at Decoder.Writable.end (node:internal/streams/writable:633:5)
    at NoFilter.onend (node:internal/streams/readable:693:10)
    at Object.onceWrapper (node:events:627:28)
    at NoFilter.emit (node:events:513:28)
    at endReadableNT (node:internal/streams/readable:1358:12)

Is there a limit on the size of JSON file that can be encoded? Do base64 buffers get converted into binary? Is there something special about this JSON that was not able to be encoded?

hildjj commented 1 year ago

This is almost certainly an issue with highWaterMark. See https://github.com/hildjj/node-cbor/blob/main/packages/cbor/README.md#highwatermark

That said, you should almost certainly use the encoder in streaming mode, rather than than just setting highWaterMark to 5M. Something like:

const cbor = require('./');
const NoFilter = require('nofilter');

const b = Buffer.alloc(5 * 1024 * 1024);
const e = new cbor.Encoder();
const nf = new NoFilter();
e.pipe(nf)
e.write(b)
console.log(nf.length) // 5242885
const cborBuf = nf.read()
console.log(cborBuf.length) // console.log(cborBuf.length) //
console.log(cborBuf) // <Buffer 5a 00 50 00 00 00 00 00 00 00 ...
kion-dgl commented 1 year ago

I'm kind of too stupid to understand that example. I have a 5MB source JSON. Where does that get passed into the encoder? And then how does that get piped in a destination buffer?

hildjj commented 1 year ago

Combining your code and my code:

const cbor = require('cbor')
const { readFileSync } = require('fs')

const gltf = readFileSync('DamagedHelmet.gltf', 'utf8');
console.log(gltf.length); // 5.0MB

const src = JSON.parse(gltf);

const e = new cbor.Encoder();
const nf = new NoFilter();
e.pipe(nf)
e.end(src)
const encoded = nf.read()
kion-dgl commented 1 year ago

The version I got working was as follows. Thanks!

import cbor from 'cbor'
import { readFileSync, writeFileSync } from 'fs'
import { dataUriToBuffer } from 'data-uri-to-buffer'

// Read a GLTF file with embedded buffers and parse as JSON
const gltf = readFileSync('DamagedHelmet.gltf', 'utf8');
const src = JSON.parse(gltf);

// Convert buffers from base64 uri to binary
for(let i = 0; i < src.buffers.length; i++){
    src.buffers[i] = dataUriToBuffer(src.buffers[i].uri);
}

// Convert images from base64 uri to binary
for(let i = 0; i < src.images.length; i++){
    src.images[i] = dataUriToBuffer(src.images[i].uri);
}

// Create encoder
const e = new cbor.Encoder();
const dst = [];

e.on("error", (err) => {
    console.log("ERRROR")
    throw err;
})

e.on("data", (buf) => {
    dst.push(buf);
})

e.on("finish", ()=> {
    const encoded = Buffer.concat(dst)
    writeFileSync('DamagedHelmet.cbor', encoded)
})

// Start encoding src JSON
e.end(src)