PoC: use htoml's parser

pbrisbin commented 8 years ago

This was an attempt to see if using the existing htoml's parser, but translating to our own Toml type would:

Be easy to do

I think yes, it took just a tiny bit of translation code. I had to use undefined for the Inline/Outline attribute because htoml doesn't have this concept. I think that means we could remove it from our type, if it's not needed for querying. If I'm wrong here, that could be the deal breaker.
Pass the BurntSushi decoder tests

We're at 66/12 using this tiny spike. I haven't investigated yet as to what's breaking the 12 remaining, but would be happy to do so if this approach has merit.
Keep the benefits of improved queriability that prompted creating this package in the first place

My assumption is that the querying code will work provided the interface of our Toml type is preserved. This spike didn't change it (modulo the Inline/Outline question) so I think this could be true.

If this approach is sound, my next step would be to drop all the now-unnecessary modules and rebrand this as something like toml-query, to avoid the import and module clashes, then investigate and address the decoder test failures.

@clord what do you think?

Note: FWIW, if I update the QuerySpec to use undefined for the Inline/Outline field in the fixture document, it still passes.

clord commented 8 years ago

I like the idea of having the ability to read from htoml's tree. Might make migration easier for some people. But I think there is room for internal optimizations to the parser too.

pbrisbin commented 8 years ago

So what do you think would be the next steps to deciding on direction? (I don't think we should pursue both). Do you think our parser is far enough along that a benchmark would show if it's actually a performance win?

pbrisbin commented 8 years ago

Using the latest version of htoml got us to 74/4.

Test: table-array-implicit (valid)

Malformed parser output. 'value' should be a JSON array when 'type' indicates 'array', but it is a map[string]interface {}.
-------------------------------------------------------------------------------
Test: table-array-many (valid)

Malformed parser output. 'value' should be a JSON array when 'type' indicates 'array', but it is a map[string]interface {}.
-------------------------------------------------------------------------------
Test: table-array-nest (valid)

Malformed parser output. 'value' should be a JSON array when 'type' indicates 'array', but it is a map[string]interface {}.
-------------------------------------------------------------------------------
Test: table-array-one (valid)

Malformed parser output. 'value' should be a JSON array when 'type' indicates 'array', but it is a map[string]interface {}.

74 passed, 4 failed

These last 4 are probably failing because of a bug in either htoml-toml-parse translation, or my JSON encoder. There seems to be some trickiness with arrays of values vs arrays of tables that's not being handled properly.

pbrisbin commented 8 years ago

OK, this now passes all decoder specs.

It looks like tracking an array of tables vs an array of values is important; the burnt-sushi specs require a different JSON representation for the two cases. To do that, I introduced TTableArray to the TNamable data type. We're effectively a constructor-for-constructor copy of htoml's Node type now, so I really doubt there's any value in having our own representation.

I'd be most interested in re-branding this package as toml-query, providing a reworked version of just that module directly on top of the htoml type, and focusing on making that feature really great.

htoml is about to release 1.0 and I would bet on it becoming defacto. Any performance improvements would be better served as patches to that library rather than re-inventing an additional parser in another competing library, IMHO. And I think this PoC proves that an improved query interface (at least as far as is implemented here) can be accomplished on top of that type.

@clord WDYT?

clord commented 8 years ago

I've decided to pursue this as a full parser. Early on I was open to doing a query layer, but since then we've both invested time in developing the parser further. There is exactly no harm in having some choice in the community over libraries. For instance, there are multiple xml parsers with different tradeoffs available. So the whole 'one true library' argument seems moot to me.

pliosoft / toml-parse

PoC: use htoml's parser #13