mattgodbolt / zindex

Create an index on a compressed text file
BSD 2-Clause "Simplified" License
620 stars 37 forks source link

Support "LIKE" queries #12

Closed mattgodbolt closed 8 years ago

mattgodbolt commented 9 years ago

Would tie us forevermore to SQL if not careful but may be worth it (or else transform some zq-ish expression to SQL.

hfuller commented 9 years ago

Thanks for submitting this. The companion on the index-creating side would be to apply some transform to the index before storing it; is that a feature you would be interested in incorporating too?

mattgodbolt commented 9 years ago

I'd be happy to support both! The LIKE thing ought to be pretty simple, whereas the transform requires a little more work.

I wonder if a more general approach like "execute this UNIX command for each line and use its output" would be better for all of these? But that may be too slow for indexing. It's certainly more "UNIXy";

Something like:

zindex foo.gz --exec "bash -c 'cut -f1 -d\  | tr a-z A-Z'"

That might also cover the "jq" case referenced in #4 too.

mattgodbolt commented 9 years ago

Looking at how LIKE actually runs, it ends up scanning every entry in the index, which is obviously slower than other approaches. A pre-transform seems to be more performant if you know ahead of time what the transform should be. I've opened #14 to track this separate request.

mattgodbolt commented 9 years ago

In 5aab30b3f7fd46106bad6a09753c6800ced147cc I've started an experiment in which I pipe output to an external command to create the index.

mattgodbolt commented 8 years ago

Raw queries (zq --raw) allow this, and many other types of queries, at the cost of exposing the sqlite innards.