int-72h / pytoast

A simple, low overhead content distribution system
BSD 3-Clause "New" or "Revised" License
8 stars 2 forks source link

Structure of manifest JSON makes it unpleasant to unmarhsal in some environments #13

Closed w3bb closed 2 years ago

w3bb commented 2 years ago

Currently a file is represented as:

"dir": [
     "revdigest1234",
     1
]

The array makes it difficult for statically types languages like Go and C(++) to unmarshal to structures, it wouldn't surprise me if the ordering isn't preserved in some places. There should be an explicit structure and keyset for the children.

"dir": {
     "digest": "revdigest1234",
     "revision": 1
}

If size is a problem the manifest can be compressed or the keys can be made smaller.

int-72h commented 2 years ago

it wouldn't surprise me if the ordering isn't preserved in some places.

I'm pretty sure it is last time I checked, but I'll change this anyway as it'll make it clearer. May bloat filesize a bit though.

w3bb commented 2 years ago

it wouldn't surprise me if the ordering isn't preserved in some places.

I'm pretty sure it is last time I checked, but I'll change this anyway as it'll make it clearer.

In the JSON specification it says its unordered, and doing a quick Google search I found people complaining their library doesn't preserve ordering.

May bloat filesize a bit though.

A trade could also be made. It would probably save more space to use, say, a smaller hash like MD5 (which should work fine for checking unintentional file corruption, and comparing files.)

int-72h commented 2 years ago

it wouldn't surprise me if the ordering isn't preserved in some places.

I'm pretty sure it is last time I checked, but I'll change this anyway as it'll make it clearer.

In the JSON specification it says its unordered, and doing a quick Google search I found people complaining their library doesn't preserve ordering.

Thats fair enough.

May bloat filesize a bit though.

A trade could also be made. It would probably save more space to use, say, a smaller hash like MD5 (which should work fine for checking unintentional file corruption, and comparing files.)

You're probably right but in the future if I did want to add signing for whatever reason it'd be easier to throw on if the hash to begin with was secure enough. Reducing the digest size seems like a good idea but there may be some crippling issue with this I haven't realised yet probably.

int-72h commented 2 years ago

I'll have this fixed hopefully by the end of today when I've smoothed off the edges of the GUI (though in hindsight I should've commited the fix first...)

w3bb commented 2 years ago

Actually, I should mention that having multiple keys that are the same are also a problem. Many libraries just put in the last or first occurance of an object with a duplicate key. Whether it's valid JSON is ambiguous at best. I think a more correct/idiomatic version would be something like this:

{
   "revisions": [
      {
         "revision": 1,
         "files": [
            {
               "file": "path/to/file",
               "digest": "DEADBEEF"
            },
            {
               "file": "path/to/other/file",
               "digest": "DEADBEEF"
            }
         ]
      },
      {
         "revision": 2,
         "files": [
            {
               "file": "path/to/file",
               "digest": "DEADBEEF"
            }
         ]
      }
   ]
}
int-72h commented 2 years ago

This does look a bit better actually. It's also a bit harder to implement as the current code flow in the packer goes "handle file -> does it exist in the old one -> increment rev by one". I don't think this would be that hard actually but I'd like to get some other opinions on this first as I'd like the schema to be rock-solid so there's hopefully no more breaking changes. Adding an RFC tag.

w3bb commented 2 years ago

Might it be worth opening a separate issue for the schema in general?

int-72h commented 2 years ago

Good idea actually.