opensearch-project / sql

Query your data using familiar SQL or intuitive Piped Processing Language (PPL)
https://opensearch.org/docs/latest/search-plugins/sql/index/
Apache License 2.0
119 stars 138 forks source link

[FEATURE] Exclude files in s3 data source #2525

Open kaituo opened 8 months ago

kaituo commented 8 months ago

Is your feature request related to a problem? A user may have a messy S3 file system and would like to exclude certain unstructured log types which are in their S3 bucket. Glue offers a way to exclude, but that is Hive functionality. I suspect we will need to upgrade our SQL grammar to support advanced filtering.

What solution would you like? Possible approaches:

penghuo commented 8 months ago

file metadata path is Spark SQL existing feature, no extra grammer change required. for example

SELECT _metadata.file_name, count(*) 
FROM alb_logs 
WHERE _metadata.file_path like '%2023/11/09%' 
GROUP BY _metadata.file_name