hildjj / node-cbor

Encode and decode CBOR documents, with both easy mode, streaming mode, and SAX-style evented mode.
MIT License
357 stars 73 forks source link

option to parse cbor at set depth only #138

Closed ashisherc closed 3 years ago

ashisherc commented 3 years ago

I would like to have an option that can allow to parse CBOR at set depth and return array/map with resulting remaining CBOR

eg. below CBOR is a map of 2 items with keys 0, and 1

a200828258200eb27dc73da5a83e42ac63ff6d9f86db182f554ae175e6f66acf48c6bdd28930584013feb04f914e5ec16bdcc343604b3aef2640fa6be1b74e1096f93bdedaba17f4e4f00aa688c09182881ecb801255c1b7f24bda432c017a2da614261338c8ea03825820ca98beb4dd9477e8a0d800c2e2989c9f32e9ea384b30521e4c03df1eeb73920f5840549d800db8b91b5cf7fb35e7447a59b1577a58fb029a688da4831d65903b1308fbcd5b856ac8424295f7d5d9c19a5f1038b90dd54802038d6ae7850450e8510701818200581c8fb4ba5ddf2af3075162567d61456e3482f3f462f60d21d480a88828

After parsing I would like to get a result as below,

Map {
   0 -> unparsed CBOR as is that is under this key (0)
   1 -> unparsed CBOR as is that is under this key (1)
}
hildjj commented 3 years ago

Have you considered using tag 24 for this? As an example: a200d818410001d8184101 :

  a2                -- Map, 2 pairs
    00              -- {Key:0}, 0
    d8              --  next 1 byte
      18            -- {Val:0}, Tag #24 Encoded CBOR data item
        41          -- Bytes, length: 1
          00        -- 00
          00        -- 0
      01            -- {Key:1}, 1
      d8            --  next 1 byte
        18          -- {Val:1}, Tag #24 Encoded CBOR data item
          41        -- Bytes, length: 1
            01      -- 01
            01      -- 1
0xa200d818410001d8184101
ashisherc commented 3 years ago

Yes, I'm aware of using tag24 for this use case.

But I'm working on an application where I do not encode the data. Rather there is a dynamic requirement that I must derive a part of the cbor for the purpose of hashing a particular hex string. In the above example that would be the value of that map keys.

I have considered decoding everything and then reEncoding the part I need to be hashed. But as per the spec I'm not allowed to do it for many reasons, one for eg.

I decode the cbor and a part of it has an array of length 5

Now when I encode the json back to CBOR, I have 2 options for the array,

  1. definite Array encoding
  2. InDefinite Array encoding

I will specifically do not know what type of array encoding was used for encoding earlier, if I change the encoding type of the array, since I have to hash the CBOR, I will end up with a different hash than what original was supposed to be. Also note that the encoding of Array is not specified at the original encoding end, it will be random.

This is one of the issues, as per the platform specification, I'm never allowed to decode the CBOR and reencode it for getting the hash of sub CBOR. Instead I should always derive the sub CBOR and hash it. This way for my usecase, hash of CBOR always matches.


So, I was exploring what could be the best way to achieve this, and I could think of having a similar option asmax_depth that return a JSON at that depth and remaining bytes as hex string CBOR

hildjj commented 3 years ago

Fitting this in as a base feature would be pretty difficult to get generic enough. As a start, look at the event-based mechanism that diagnose.js uses.

I'm open to ideas or patches if you see how it ought to work. I think it will be a lot easier to add to the cbor-wasm library when I get it closer to complete -- but please do not start relying on its API yet, it's in very heavy development at the moment.

ashisherc commented 3 years ago

Sure, I will give it a shot.

very interesting what you doing with cbor-wasm 👌👌. Would love to see this feature available in there.

hildjj commented 3 years ago

Take a look at the discussion thread I just started there so you can influence my approach before I go too far: https://github.com/hildjj/cbor-wasm/discussions/12