kalekundert / byoc

MIT License
0 stars 0 forks source link

Use external library to parse string keys #25

Closed kalekundert closed 2 years ago

kalekundert commented 3 years ago

Right now, string keys may dotted to get values from nested data structures. However, I just realized that there are many third party libraries out there that do the same thing, but with much more sophistication. Obviously any of these libraries can be used via callable keys, but it might be nice to support one of them out of the box:

After a cursory bit of reasearch, JMESPath seems like the best option. Glom doesn't seem to provide a string-based way to index lists, and JSONPath seems like an earlier and less well-defined version of JMESPath. See also: https://levelup.gitconnected.com/json-queries-give-your-users-jmespath-power-ef8ab0d38553

kalekundert commented 2 years ago

This would slow down the most common case, which is simple string keys:

>>> import timeit
>>> timeit.repeat('d[k]', 'd,k={"a": 1},"a"')
[0.0723611080320552, 0.03749351901933551, 0.029345180955715477, 0.028627041028812528, 0.02867339097429067]
>>> timeit.repeat('search(k, d)', 'from jmespath import search; d,k={"a": 1},"a"')
[7.047965059988201, 6.955558293033391, 7.148821344017051, 7.014664759975858, 7.053621853003278]
>>> timeit.repeat('p.search(d)', 'from jmespath import compile; d = {"a": 1}; p = compile("a")')
[5.188850256963633, 5.067834256915376, 5.091236199019477, 5.052642205962911, 5.080472690984607]
>>> timeit.repeat('lookup(d, k)', 'from appcli import lookup; d = {"a": 1}; k = "a"')
[0.49615325999911875, 0.49428523203823715, 0.5081062640529126, 0.49312764895148575, 0.5119314019102603]

So jmespath is about 10x slower than my current function, and about 100x slower than just assuming the key is a string. Some thoughts:

I'm a bit on the fence, but I think I'm going to use jmespath by default—opting for convenience over performance. I can always revert the change if performance turns out to be an issue.


Actually, I realized that jmespath can't handle keys like --flag. That makes it a non-starter for being enabled by default. But as I said above, it's easy to opt into. I should mention it in the docs.