ostafen / clover

A lightweight document-oriented NoSQL database written in pure Golang.
MIT License
677 stars 54 forks source link

[Feature Request] Utilize `encoding.BinaryMarshaller` and `encoding.BinaryUnmarshaller` #101

Closed duckbrain closed 1 year ago

duckbrain commented 1 year ago

If encoding.TextMarshaller is used for types found with clover.NewDocumentOf and encoding.TextUnmarshaller with clover.Document.Unmarshal, then standard library and custom types can be used for fields that aren't currently able to marshal/unmarshal correctly like a custom UUID type.

ostafen commented 1 year ago

Hi, @duckbrain, thanks for the suggestion. However, using a textual encoding, such as encoding.TextMarshaller, would have a severe impact on query performance, considering that, at query time, much time is wasted in document deserialization. This is why, I switched from json to msgpack, which is a binary encoding and is thus more performant. Supporting custom types is definetely a good idea, but I think this feature should not rely on some assumptions made about the underlying encoding.

duckbrain commented 1 year ago

In that case, encoding.Binary*?

ostafen commented 1 year ago

encoding/binary only allows you to encode integer types. Anyway, the problem here is not about finding a better encoding for documents, since msgpack is already doing its job. Supporting custom types should not be tight to a particular encoding scheme, since it would be impossible to change it in the future. The way encoding libraries (among these, json and msgpack itself) allow the user to include custom types is by letting the user define custom Marshal() and Unmarshal() methods on the custom object, which serialize the custom type to byte slices.

duckbrain commented 1 year ago

I was referring to the encoding.BinaryMarshaller and encoding.BinaryUnmarshaller interfaces instead of the text-based interfaces in the same package.

They serve as a standard for encoding and decoding custom types into a binary format, not an encoding scheme, but instead can be used by encoding formats like gob.

Standard library types also implement these standards, so it could remove some special handling for certain types.

ostafen commented 1 year ago

Oh, sorry about that, we were basically saying the same thing. Would you like to work on this feature?

duckbrain commented 1 year ago

After looking into it. msgpack is already using encoding.BinaryMarshaller and encoding.BinaryUnmarshaller, but document.Document is normalizing everything into map[string]interface{} and that's what is failing to handle the binary encoding.

It looks like internal.Convert is using json to copy values from one object to another. In general, an normal lookup into an object goes from binary encoded msgpack -> map[string]interface{} -> json -> your object. To fix this Document would need to be able to use an object and reflect directly on it.

I don't see a way to do this without rewriting large swaths of the internals, and I'm not sure backwards compatibility can be preserved with the API.

ostafen commented 1 year ago

After looking into it. msgpack is already using encoding.BinaryMarshaller and encoding.BinaryUnmarshaller, but document.Document is normalizing everything into map[string]interface{} and that's what is failing to handle the binary encoding.

I think it should be enough to leave values implementing the binary.BinaryMarshaller interface untouched when normilizing to map[string]interface{} in order to preserve custom types.

It looks like internal.Convert is using json to copy values from one object to another. In general, an normal lookup into an object goes from binary encoded msgpack -> map[string]interface{} -> json -> your object. To fix this Document would need to be able to use an object and reflect directly on it.

Yes, but this only holds if you want to convert a Document to a custom struct object, using Unmarshal() (I would like to optimize this in the future, by removing the json step, which is actually a way to simply convert between types). Anyway, when documents are loaded from disk, they are simply unmarshalled from the msgpack encoding.

duckbrain commented 1 year ago

I think it should be enough to leave values implementing the binary.BinaryMarshaller interface untouched when normilizing to map[string]interface{} in order to preserve custom types.

Right you are. For some reason, I was trying to Marshal them there, which was making it a lot harder than it needs to be. PR is up.

Anyway, when documents are loaded from disk, they are simply unmarshalled from the msgpack encoding.

I see how that works and the need for the Document type for building the indexes and having a way to pass the values around, but don't personally see when I would get a document out of the database and not immediately convert it into a type-safe object.

I would like to optimize this in the future, by removing the json step, which is actually a way to simply convert between types

The "simple" way to handle that of course is custom reflection or something like structmap.

On the idea of optimizing for what I see as the common case. I messed around with the idea deferring the unmarshalling until requested or needed to use Get and the like. That's is not completely functional, nor suggested, but you're welcome to peruse/use.

https://github.com/duckbrain/clover/tree/feature/delay-decoding

ostafen commented 1 year ago

Thank you for the PR, which I just merged in the v2 branch.

The "simple" way to handle that of course is custom reflection or something like structmap.

On the idea of optimizing for what I see as the common case. I messed around with the idea deferring the unmarshalling until requested or needed to use Get and the like. That's is not completely functional, nor suggested, but you're welcome to peruse/use.

https://github.com/duckbrain/clover/tree/feature/delay-decoding

These are interesting ideas, which I think should be handled in a separate issue. I think this one is solved, so I'm closing it