Open LantaoJin opened 2 months ago
+1 on (Preferred) Upload CSV files to external URL. One concern is how to prevent users from accessing any data in the local filesystem, as this poses a security risk.
I agree with @penghuo this is a possible security concern - I would propose using a different approach: use the dashboard for loading a csv file into and index and using this index for the lookup
I hate to be that guy, but I know of those in the community who would want to load the CSV into their index as well as those who want to load the CSV into cloud storage. From a priority perspective, index should be the first as it is the easiest (assuming the analyst has write access to the cluster). Dealing with cloud storage introduces permissions friction.
+1 on (Preferred) Upload CSV files to external URL. One concern is how to prevent users from accessing any data in the local filesystem, as this poses a security risk.
A straightforward solution is allowing the s3://
schema URL only in product.
From a priority perspective, index should be the first as it is the easiest
Yes. I got the priorities. We have the lookup
issue https://github.com/opensearch-project/opensearch-spark/issues/620 opened. This issue is for the requirement of loading data from a CSV (similar to the the inputlookup
command in Splunk).
Support the functionality of loading data from CSV file.
file location
There are two options in which a CSV file to store:
SPARK_LOCAL_DIRS
environment variable or configspark.local.dir
, For example,$SPARK_LOCAL_DIRS/<some_identities>/lookups/test.csv
. But uploading to an local dir could introduce potential security issues, especially if the Spark application runs on cloud service.s3://<bucket>/foo/bar/test.csv
,file:///foo/bar/test.csv
.PPL syntax
There are also two options to support this feature:
A. Introduce a new command
inputlookup
orinput
:Usage:
The
FlightDelay > 500
only works when the flights.csv contains a csv header.B. Modify the current
search
command to support file:Usage:
PS: the current
search
command syntax isBoth option A and B could be used in sub-search: