owengage / fastnbt

Fast serde serializer and deserializer for Minecraft's NBT and Anvil formats
MIT License
186 stars 34 forks source link

Support stringified NBT (sNBT) #49

Closed owengage closed 7 months ago

owengage commented 2 years ago

Minecraft also uses a variant of NBT that is human-readable and json-like.

If this crate/set of crates wants to be complete, it should offer serde support for this format too.

The (de)serializers would have to agree on the format of the array types to be compatible.

GrizzlT commented 1 year ago

In my mc server development journey, I've reached the point where I need sNBT. I'm very much willing to help out add this to fastnbt.

A few questions I had so far:

I'll probably take some inspiration from serde_json for the writer and reader. (I'm still researching deserialization)

owengage commented 1 year ago

Hi, glad your project is progressing :)

I did start some of the sNBT stuff a long while ago, I've just pushed that local branch up: dev/snbt. I don't remember what state it is in. It is its own crate though, fastsnbt.

If you go ahead working on it, I try to do it test driven as much as I can, since it's relatively easy to unit test effectively. The serde_json crate is definitely good inspiration, I was using that too.

For your questions:

fastnbt seems to exclude certain types from being serialized as the root, should this also be a requirement for sNBT? I feel like that's not as necessary since it's a human-readable format and serde_json doesn't exclude types from being root either (simpler api).

There's a good reason for needing a root compound... but I can't remember off the top of my head what is it! Maybe it will show up if you try to implement it. Not opposed to non-root working.

When serializing strings, what characters should be escaped out? I was thinking of " and \, are there more?

That's just going to depend on what is valid according to Minecraft.

What's a good/logical file structure to integrate an sNBT (de)serializer in this crate?

As mentioned above, in it's own crate. Serde has some conventions it uses for data format crates. fastnbt and serde_json's conventions are probably good to follow.

Some questions I'm not sure on the answer to are:

  1. Do we share the error type between fastnbt and fastsnbt?
  2. Do we share the array types as well? The two crates should definitely interoperate, ie you can deserialize from one and serialize to the other no problem.
  3. If we do share them, do they go in some 'common' crate?
GrizzlT commented 1 year ago

Awesome to hear!

If you go ahead working on it, I try to do it test driven as much as I can, since it's relatively easy to unit test effectively.

I'll be working on top of that branch then. I'm quite familiar with writing tests so that'll go along nicely.

That's just going to depend on what is valid according to Minecraft.

Then it's only the quotes that matter but I think including the backslash has many benefits. For serialization I think always returning " makes things easier while deserialization should of course handle both " and '.

Do we share the error type between fastnbt and fastsnbt?

If they go in separate crates, I see no reason to merge the error types. Since sNBT should always be valid unicode there's a separation of concerns that goes along well with having separate crates I think.

Do we share the array types as well? The two crates should definitely interoperate, ie you can deserialize from one and serialize to the other no problem.

Interoperability is a requirement for me too, I think having a common crate is a good idea in that case. The crates' versions should probably be held in lockstep then? Do you think reexporting the common crate in the other crates is a good idea?

owengage commented 1 year ago

All sounds great.

I agree about serializing with quotes/picking one. It may be something that can be configured in future, but for now yes.

I think I agree on not sharing the error type. For the arrays I think we can just depend on fastnbt directly for now. When we're at a v1 for sNBT we can do the changes necessary to move things to a common crate if we still want to. If we did that, fastnbt would need to reexport for backwards compat, and I think it makes sense to do so.

Lock-stepping the versions sounds like the only sensible option. Can think harder about that when we're there I guess.

GrizzlT commented 1 year ago

I did some more tinkering, I understand the internal tricks of ByteArray, IntArray and LongArray a bit better now. For serialization they get first converted in an array of bytes.

To support those types in sNBT, I just converted those back into the corresponding integer type arrays when passed into a specialized serialize_bytes function. This seems very redundant.

I think a solution could be to add a new specialized NbtArraySerializer that only expects sequences or bytes and always turns those into the corresponding nbt array. That way the Serialize implementations could just use the same prefix trick but pass in the immediate Vec<i8>, Vec<i32> or Vec<i64>. The conversion into bytes would then happen in the serializer in fastnbt. fastsnbt would have no performance degradation this way.