tidwall / gjson

Get JSON values quickly - JSON parser for Go
MIT License
14.1k stars 846 forks source link

adding regex support #257

Open migopsdinesh opened 2 years ago

migopsdinesh commented 2 years ago

Thank you very much for the JSON library, and it's useful for us. JSON navigation and JSON query is nice, and we see the regex pattern is missing in the current implementation.

So, we wanted to propose the regex support to this library, and here is the pull request. Requesting you to look into this pull request, and do let us know if any further improvements are required.

tidwall commented 2 years ago

Hi @migopsdinesh,

Wow this is a big PR. 😅 Adding Regex is a significant update. On the surface I like the idea but unfortunately I'm not much of a "regexpert". I'm more of a casual user, so I'll need some time to digest.

I would also like to get some additional feedback from the community, if possible.

migopsdinesh commented 2 years ago

Thanks @tidwall Looking forward for more updates

lammel commented 2 years ago

For flexibility regex matching is a nice addition. Nice work!

The current syntax is a little awkward using the "~" separator, so although I'm quite fluent in regex this is hard to read for me. The mechanism is based on regexp.MatchString() which could be rather slow due to not being precompiled.

Not sure, if an implementation could use precompiled regex or the syntax could be changed to work without "~" (e.g. by using multiple regexes, or a single precompiled regex matching the full json path)

tidwall commented 2 years ago

The current syntax is a little awkward using the "~" separator

I wonder if using forward slashes might be easier to read? I don't think that the forward slash is being used atm, and it would be very rare to see it used as a leading character for json keys. Also there could be the benefit that it's used in the native javascript syntax, so many folks might be accustomed to it.

Not sure, if an implementation could use precompiled regex or the syntax could be changed to work without "~"

If having precompiled regex is need, then perhaps we could do a lightweight LRU cache at the global state that does a JIT compile, on demand. Maybe also with an auto expiration, just so it doesn't stay in memory for ever.

lammel commented 2 years ago

Based on the suggestion using slash (/) instead of ~ tests would look like this:

regexGet("/./", true)
regexGet("loggy./programmers/", true)
regexGet("/last*/./end*/", true)
//Get any 3 word length string from the JSON path
regexGet(`loggy.programmers.2./^(\w{3})$/`, true)

This does look more familiar to my regex eyes. Multiple regexes are allowed for the syntax and also combining with the current syntax which does seem to be very flexible.

Concerning performance some simple benchmarks could tell if precompiling would really help here, as long as the regex is not repeatedly used uncompiled when traversing the tree there should be not too many gains.

migopsdinesh commented 2 years ago

@Iammel, @tidwall

Thank you for your inputs on this. Will try to fix this submitted PR with the suggested changes. I am sorry for the long delay on this, as I was out due to some personal work. Will try to update the new PR as soon as possible.

Thank you again.

bjoerndemeyer commented 2 years ago

@migopsdinesh We could also use this, but currently we use a custom Modifier @match:"regexp" for this. So I am not sure that we need new syntax for this. But I have to admit the syntactic sugar is sweet.