Array encoding - Githubissues

zoyo-de commented 5 years ago

Hi mradziwo,

I just found the time to push some changes to my repo that I made last year. They are not good enough to open a pull request but there may be some points you might want to have an eye on it. PronovaPaul recently pushed some really nice changes (features, no openG-dependency). Among other things he worked on the same problem like me:

array encoding: We both use the static type information and for fixed-size types the encoded data can be build by interleaving header and data. And at this point our solutions differ. Here I cast the array to a byte-array, reshape it according the elements size, add a header column and reshape it again (see enca*.vi). In my short test this is about six times faster than Pauls solution with element-wise encoding, so it might be an interesting version to consider. But I'm not sure about all implications, esp. for multidimensional arrays. In this case additional array headers have to be added... But generally, array reshaping should be a quite fast operation because the the array's content is not accessed.
enum-encoding: I added an option to to switch between encoding as integer or the enum string. This sometimes can be helpful.

And because of the increasing number of options it might be a good time to change to a single config cluster input instead of lots of boolean/enum inputs.

Well, just some thoughts....

regards, Zoyo

mradziwo commented 5 years ago

Hello Zoyo,

Thank you for the updates!

It sounds like a good idea to make a benchmark test for array encoding. I did not look at your solution yet - from description it makes obvios point that is should work faster, though I am not sure whether it fully comply with "the standard". Could I ask you to evaluate whether an array encoded with your method will be correctly decoded by other (non-LabVIEW ) msgpack decoders? In case that's true I'd be looking forward to choose the faster working solution to the main branch.
Enum in "good old C style" enum is just a handy feature to "hide" numerals for more convenient use by developer - so I will stay strongly in favor for default encoding as fixed-size type. Also, LabVIEW supports "format into string" of enums to transform them to strings directly if one wishes to export a particular value as string.

The problem emerges from the fact, that if aconsumer of msgpack-encoded message wants to continue to use this value as enum - it would require to have more information; i.e. the full definitionj of an enum. You may see such behavior in "flatten to xml" where enum type is flattened to a structure which contains all elements. Here it would require to construct a language (LabVIEW) specific implementation [exactly like in xml] where encoded object will be a package of the actual value and and enum definition. Such constrct is at higher level than encoding itself - and should not be a part of base encoding library.

Best regards, Michał Radziwon

On Sat, 23 Feb 2019 at 01:56, zoyo-de notifications@github.com wrote:

Hi mradziwo,

I just found the time to push some changes to my repo that I made last year. They are not good enough to open a pull request but there may be some points you might want to have an eye on it. PronovaPaul recently pushed some really nice changes (features, no openG-dependency). Among other things he worked on the same problem like me:

1.

array encoding: We both use the static type information and for fixed-size types the encoded data can be build by interleaving header and data. And at this point our solutions differ. Here I cast the array to a byte-array, reshape it according the elements size, add a header column and reshape it again (see enca*.vi). In my short test this is about six times faster than Pauls solution with element-wise encoding, so it might be an interesting version to consider. But I'm not sure about all implications, esp. for multidimensional arrays. In this case additional array headers have to be added... But generally, array reshaping should be a quite fast operation because the the array's content is not accessed. 2.

enum-encoding: I added an option to to switch between encoding as integer or the enum string. This sometimes can be helpful.

And because of the increasing number of options it might be a good time to change to a single config cluster input instead of lots of boolean/enum inputs.

Well, just some thoughts....

regards, Zoyo

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mradziwo/MsgPack_LabVIEW/issues/6, or mute the thread https://github.com/notifications/unsubscribe-auth/AI1rE-xmx1ovL9RiytWDnNIXeEjRZG1Kks5vQJHIgaJpZM4bNQCO .

zoyo-de commented 5 years ago

Hello Michal,

1) our old version of array encoding already doesn't use the compressed encoding scheme of the spec (this is one feature of Paul's version) because it uses the data type of each array element, not the numerical value. My new version gives the same results like the old one or Paul's version with "compress = False". And I use our old version since 9 month, my new one since 2 month, both with the Python msgpack module and it works great. But I didn't use any other msgpack lib. From the spec I can't see a reason that this shouldn't work.

For speed tests please also use large arrays with 1e6 ... 1e7 elements (typical image size).

2) The feature of enum-to-string encoding can get handy when using decodeObject.vi with different compound data types (arrays of clusters of ...) where it gets nasty to replace enum elements by their string representation. When using quite elementary data types and encode them by their corresponding vis "by hand" its simple to convert the enums yourself. I'm also aware that this is merely for quick'n'dirty hacks where you just want to get data into a readable format. :)

kind regards, Christian (Zoyo)

mradziwo / MsgPack_LabVIEW

Array encoding #6