wc-duck / datalibrary

Open Source Data Library for data serialization.
Other
42 stars 8 forks source link

Converting data before DL reads it. #65

Closed lundmark closed 6 years ago

lundmark commented 6 years ago

Hey!

So the issue is that we want to be able to specify things in strings in our intermediate format (for example, we want to have paths to resources, or use intermediate-resource-id's), these we want to convert to id's that are used in runtime.

So our structs might look like: struct MyOB { uint64_t path; }

Where the path is something we want to specify as a string in the intermediate (json) format: "MyOB" : { "path" : "/my/path/object" }

Now the best suggestion or idea I have is basically tagging them like this:

"path" : #PATH"/my/path/object"

and then when I read the file, before DL parses it, I just go through it all and see if I can find any such strings to convert.

I would love to see a "better" or more proper way to solve this longterm that is sustainable and useable for DL as well.

wc-duck commented 6 years ago

That is really a though question and I'm not really sure I want to add custom packing callbacks per member as that would spread throughout the codebase.

Also, how would that work with unpacking to text? Now the user would have to be able to provide reverse ops as well.

dlpack.exe will also have to be custom built.

Lastly, how would you specify the data in the tld as text has one type and binary another?

At home I just specify to formats, one intermediate and one runtime and have my own compiler in the middle. IMHO that is a better solution as you can tailor the intermediate format for editing and the runtime format for runtime.

But maybe there is something dl can help out with in the process of im -> runtime?

wc-duck commented 6 years ago

String to hash as its own type might be something that can be done but with some caveats.

Hash would be it's own type "hash32" or "hash64" in the tld, it would be specified as a string or int in text and always unpack as int (as there is no reverse lookup). The hash-function used would be decided by dl BUT overridable at compile-time.

But I do not know if such a type would be useful?

wc-duck commented 6 years ago

Just a quick thought... That I haven't worked through in my head yet. But maybe one thing that dl could help out with somehow is to be able to define some kind of alias for the basic types.

So that you could define the type "path" or "uuid", specify that in your intermediate format, and detect in your code that this type should be handled as a path?

wc-duck commented 6 years ago

... but it would otherwise just work as a string throughout dl

lundmark commented 6 years ago

Hmmm, I'm not very fond of the hash-function solution as that is extremely rigid.

What I think could be interesting if there was a way to create types that have a certain text-format and a certain binary-format, and some way to convert between these. It's the conversion that I'm uncertain of how it would actually work.

I'll try to formalize something when we come further in using DL. Currently I'll just make sure to analyze our intermediate format before passing it to DL.

joeldevahl commented 6 years ago

I'm also interested in something like this. Currently I have two types: X and X_intermediate and I write utils that loads the intermediate file (text), hashes/preprocesses certain parts and write the real data file (binary). Not optimal at all, but probably a lot of things could be automated.

On Thu, Dec 7, 2017 at 7:39 AM, Simon Lundmark notifications@github.com wrote:

Hmmm, I'm not very fond of the hash-function solution as that is extremely rigid.

What I think could be interesting if there was a way to create types that have a certain text-format and a certain binary-format, and some way to convert between these. It's the conversion that I'm uncertain of how it would actually work.

I'll try to formalize something when we come further in using DL. Currently I'll just make sure to analyze our intermediate format before passing it to DL.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/wc-duck/datalibrary/issues/65#issuecomment-349877410, or mute the thread https://github.com/notifications/unsubscribe-auth/AADPj2Um_UjdtPF3nHiGvFSYWBQQBQ68ks5s94gIgaJpZM4Q36eV .

lundmark commented 6 years ago

I've written a small parser that I use in my compiler which converts intermediate data (text) to runtime data (binary). It just does a pre-pass on the text to parse simple tags: "path": $ID64:"/my/path/to/my/file"

path is of type uint64, of course.

This works really fine with a multiple of different things which I use. It's just a simple header file so if anyone wants it, I'll clean it out for public view and share it.

wc-duck commented 6 years ago

That absolutely sounds like a great tool but for now I think it make sense as an extension/lib ontop of DL. I would love to see it btw!

And if used by more people I might consider adopting it in the official report.

lundmark commented 6 years ago

Yes for sure external and not a part of DL.

I will try to get some time to clean it up and make it actually useable. Right now it depends on internal types that we use (strings/allocators). I'll try to make an external header for it.

wc-duck commented 6 years ago

I'll close this issue now since I consider it won't fix.