johannesboyne / gofakes3

A simple fake AWS S3 object storage (used for local test-runs against AWS S3 APIs)
MIT License
361 stars 84 forks source link

S3 Select mocking support #35

Open iori-yja opened 5 years ago

iori-yja commented 5 years ago

Hello,

I am wondering if it's possible to deal with a code which uses "S3 Select" feature.

S3 Select is an S3 feature to get contents with SQL query. It is, in general, a tough challenge with a great benefit to test a query like SQL which leads its query complex, and S3 Select is something special in testing difficulty. S3 Select is used with large aws-sdk's interface and its backend is locally unavailable.

My questions:

I am not sure even it is good idea to use other project's SQL parser for this purpose.

shabbyrobe commented 5 years ago

Is it in gofakes3's scope of concern to support select feature?

Yep I think so, but I think it will be difficult to get right. Compliance will be especially tricky - even just the list of different errors that the S3 SQL parser can emit is very long, and that's before we start getting into all the corner cases of the expression parsing. It'll take a long time to replicate the behaviour exactly. Doesn't mean we shouldn't try, but we should be very up front with users that GoFakeS3's implementation will not be able to be accurate for a long time, if ever.

If so, how the implementation supposed to be?

I've got a branch here that shows how I think it should be structured, though this code is in no way ready for release yet and I haven't actually got the SQL part of it up and firing yet, just the structure: https://github.com/johannesboyne/gofakes3/compare/master...shabbyrobe:feature/s3-select?expand=1

It has very minimal impact on the existing code though. S3 Select is quite broad so I think we should focus on shipping a version with support for a small set of use cases and encouraging people to make their own more tailored plugin if they need more.

I am not sure even it is good idea to use other project's SQL parser for this purpose.

I think the plugin-based structure in this PR means that that's probably a good place to start if we can - the core of GoFakeS3 is unaffected, we can move the code that depends on a 3rd-party SQL parser off to the side. I started looking at this one, but it chokes on the select field semi-JMESPath syntax used by S3 Select; SELECT s._1, s._2 FROM S3Object[*].Pants[*] s WHERE s._3 > 100 somewhat unsurprisingly fails parsing with this error: syntax error at position 33 near '['.

shabbyrobe commented 5 years ago

Same with github.com/araddon/qlbridge. I suppose I shouldn't be surprised; S3 Select is a bit outside the standard. I think we'll probably have to either hand-roll a parser or borrow the bulk of one from an existing project; I'd prefer the latter if possible. I'll take a look around inside qlbridge to see if the guts of it can be airlifted in.

iori-yja commented 5 years ago

Nice to see your implementation! That's big.

It'll take a long time to replicate the behaviour exactly

Yes, it must be. For testing purpose, it is not always needed to be exact same error message, though. I believe that handling the working example of let's say, select, will come on the most needed part at first, especially seeing whether where part logic works as intended, for example.

Parsing S3 Select query is quite difficult because it is outside the standard as you pointed out but minio has already implemented [https://github.com/minio/minio/blob/master/pkg/s3select/sql/parser.go]. The approach of parser implementation is quite different with your PEG's one, yet I wonder if it is possible to borrow from them.

shabbyrobe commented 5 years ago

Oh that's a good find, I'll see if I can airlift minio's implementation directly. The PEG was just a "let's get started quickly" experiment. I've got the absolute basics almost working locally, but if minio's works nicely out of the box with the structure I added the other day, it makes no sense to continue with my own.