peak / s5cmd

Parallel S3 and local filesystem execution tool.
MIT License
2.72k stars 242 forks source link

command: extend `select` api support #611

Closed denizsurmeli closed 1 year ago

denizsurmeli commented 1 year ago

This PR extends the support for AWS S3's Select API.

Resolves #494, resolves #357 .

denizsurmeli commented 1 year ago

The aws-sdk-go version that we use have a bug that it logs the response header regardless of the log level set in s5cmd. Here is the related issue, so I will bump to the version that has the fix and check if anything breaks.

denizsurmeli commented 1 year ago

Minio panics during SELECT queries. Open to suggestions about testing.

panic: runtime error: index out of range [0] with length 0

goroutine 779 [running]:
github.com/minio/minio/internal/s3select/csv.NewReader.func1({0x5929480?, 0xc0093743f0?})
    github.com/minio/minio/internal/s3select/csv/reader.go:306 +0x425
github.com/minio/minio/internal/s3select/csv.(*Reader).startReaders.func3()
    github.com/minio/minio/internal/s3select/csv/reader.go:244 +0x19c
created by github.com/minio/minio/internal/s3select/csv.(*Reader).startReaders
    github.com/minio/minio/internal/s3select/csv/reader.go:233 +0x705
denizsurmeli commented 1 year ago

Minio panics during SELECT queries. Open to suggestions about testing.

panic: runtime error: index out of range [0] with length 0

goroutine 779 [running]:
github.com/minio/minio/internal/s3select/csv.NewReader.func1({0x5929480?, 0xc0093743f0?})
  github.com/minio/minio/internal/s3select/csv/reader.go:306 +0x425
github.com/minio/minio/internal/s3select/csv.(*Reader).startReaders.func3()
  github.com/minio/minio/internal/s3select/csv/reader.go:244 +0x19c
created by github.com/minio/minio/internal/s3select/csv.(*Reader).startReaders
  github.com/minio/minio/internal/s3select/csv/reader.go:233 +0x705

Minio panics only on CSV queries that you don't specify the delimiter, which is weird that it does not validate the delimiter in the request, JSON queries seemed fine.

igungor commented 1 year ago

PR also fixes #357 @denizsurmeli

denizsurmeli commented 1 year ago

Refactored/fixed the comments.

igungor commented 1 year ago

I'd like to share my expectations while playing with this feature:

  1. Expected CSV output if I query CSV objects (same thing for TSV, or other formats).
./s5cmd select csv --delimiter "," -e 'select * from s3object' 's3://bucket/ibrahim/s5cmd-select/prices.csv'
{"_1":"id","_2":"name","_3":"price"}
{"_1":"1","_2":"avocado","_3":"3.99"}
{"_1":"2","_2":"banana","_3":"1.99"}
{"_1":"3","_2":"cabbage","_3":"0.99"}
  1. How do I query TSV files?
    ./s5cmd select csv --output-format csv --delimiter "\t" -e 'select * from s3object' 's3://bucket/ibrahim/s5cmd-select/prices.tsv'
    ERROR "select csv --delimiter=\\t --query=select * from s3object --output-format=csv s3://bucket/ibrahim/s5cmd-select/prices.tsv": InvalidRequestParameter: The value of parameter FieldDelimiter is invalid. Please check the service documentation and try again. status code: 400, request id: ...
denizsurmeli commented 1 year ago

I'd like to share my expectations while playing with this feature:

  1. Expected CSV output if I query CSV objects (same thing for TSV, or other formats).
./s5cmd select csv --delimiter "," -e 'select * from s3object' 's3://bucket/ibrahim/s5cmd-select/prices.csv'
{"_1":"id","_2":"name","_3":"price"}
{"_1":"1","_2":"avocado","_3":"3.99"}
{"_1":"2","_2":"banana","_3":"1.99"}
{"_1":"3","_2":"cabbage","_3":"0.99"}
  1. How do I query TSV files?
./s5cmd select csv --output-format csv --delimiter "\t" -e 'select * from s3object' 's3://bucket/ibrahim/s5cmd-select/prices.tsv'
ERROR "select csv --delimiter=\\t --query=select * from s3object --output-format=csv s3://bucket/ibrahim/s5cmd-select/prices.tsv": InvalidRequestParameter: The value of parameter FieldDelimiter is invalid. Please check the service documentation and try again. status code: 400, request id: ...

For the first issue, you are right, I have implemented the feature. For the second question, I have added an example query to the help command. I have also extended the flag descriptions.

denizsurmeli commented 1 year ago

Applied the suggestions and refactored the code. @igungor