Closed ajoga closed 1 year ago
Hey @ajoga , can you please share some examples of paths
you've tried in your csv.spc
and what Steampipe returned once you entered steampipe query
? For troubleshooting around paths
, we've found that stating the config files and outcomes is helpful due to the number of possibilities.
For some more background on how we use the URLs passed into paths
, we pass these URLs into the GetSourceFiles function from the Steampipe Plugin SDK, which eventually uses the hashicorp/go-getter library.
From their README, we've followed their examples, e.g., https://github.com/hashicorp/go-getter#s3-bucket-examples. They don't explicitly mention how they resolve regions, but looking at their code, it does look like they try to get the region from parsing the URL, and then they use the standard AWS Go SDK to list objects.
Hey @cbruno10, sure!
I tried this config line :
paths = [ "s3::https://XXXXXXX1-XXXXXXX.s3.eu-west-1.amazonaws.com/XXXXXXX.csv?aws_region=eu-west-1&aws_profile=aa" ]
I'd rather not publish the hostname, but in case it matters, it includes [a-z], a digit and a dash.
Starting steampipe query
interestingly does not output any error now for some reason. However, in /.steampipe/logs/plugin-2023-04-05.log
I see :
2023-04-05 16:35:25.247 UTC [ERROR] steampipe-plugin-csv.plugin: [ERROR] csv.csvList: failed to fetch absolute path="failed to get directory specified by the source s3::https://XXXXXXX1-XXXXXXX.s3.eu-west-1.amazonaws.com/XXXXXXX.csv?aws_profile=aa&aws_region=eu-west-1: error downloading 'https://XXXXXXX1-XXXXXXX.s3.eu-west-1.amazonaws.com/XXXXXXX.csv?aws_profile=aa&aws_region=eu-west-1': MissingRegion: could not find region configuration" path="s3::https://XXXXXXX1-XXXXXXX.s3.eu-west-1.amazonaws.com/XXXXXXX.csv?aws_region=eu-west-1&aws_profile=aa"
2023-04-05 16:35:25.261 UTC [WARN] failed to set connection config: failed to get directory specified by the source s3::https://XXXXXXX1-XXXXXXX.s3.eu-west-1.amazonaws.com/XXXXXXX.csv?aws_profile=aa&aws_region=eu-west-1: error downloading 'https://XXXXXXX1-XXXXXXX.s3.eu-west-1.amazonaws.com/XXXXXXX.csv?aws_profile=aa&aws_region=eu-west-1': MissingRegion: could not find region configuration
With a csv.spc
containing paths = [ "s3::https://XXXXXXX1-XXXXXXX.s3.eu-west-1.amazonaws.com/XXXXXXX.csv?region=eu-west-1&aws_profile=aa" ]
, the errors are the same.
Note that the csv.spc
has no other paths specified, and all config keys are by default (ie: commented).
There is also no other plugins enabled, as I reproduced this issue in a new VM
which eventually uses the hashicorp/go-getter library.
@cbruno10 , it looked like this issue in their project is related: https://github.com/hashicorp/go-getter/issues/393 ...
I'm unable to read the go code, but if indeed the parsing of the url only goes through the S3Path method, then this could be the source for my main concern: having to explicit a region because I used vhost-style.
So I tried to use the path-style config, with no success.
paths = [ "s3::https://s3.eu-west-1.amazonaws.com/XXXX1-XXXXX/XXXXX.csv"]
& region in credentials
file makes steampipe query
return Warning: failed to start plugin 'hub.steampipe.io/plugins/turbot/csv@latest': failed to get directory specified by the source s3::https://s3.eu-west-1.amazonaws.com/XXXXXXXXXXXXXXXX-XXXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXX.csv: error downloading 'https://s3.eu-west-1.amazonaws.com/XXXXXXXXXXXXXXXX-XXXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXX.csv': InvalidBucketName: The specified bucket is not valid. status code: 400, request id: XXXXXXXXXXXXXXXX, host id: XXXXXXXXXXXXXXXX=
exchanging the dot for a dash after https://s3, like paths = [ "s3::https://s3-eu-west-1.amazonaws.com/XXXXXXXXXXXXXXXX-XXXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXX.csv?aws_profile=aa"]
& with region in credentials
file works!
However, paths = [ "s3::https://s3-eu-west-1.amazonaws.com/XXXXXXXXXXXXXXXX-XXXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXX.csv?aws_profile=aa®ion=eu-west-1"]
with no region in credentials
yields on steampipe query
invocation the error Warning: failed to start plugin 'hub.steampipe.io/plugins/turbot/csv@latest': failed to get directory specified by the source s3::https://s3-eu-west-1.amazonaws.com/XXXXXXXXXXXXXXXX-XXXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXX.csv?aws_profile=aa®ion=eu-west-1: error downloading 'https://s3-eu-west-1.amazonaws.com/XXXXXXXXXXXXXXXX-XXXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXX.csv?aws_profile=aa®ion=eu-west-1': MissingRegion: could not find region configuration
@ajoga I don't think I need the specific hostname, so what you sent over is sufficient.
Looking at the combination of query parameters in https://github.com/hashicorp/go-getter#s3-s3 and the sections below, it seems like aws_profile
and region
are not intended to be used together, and region
should only be used with the aws_access_key_id
and aws_access_key_secret
params.
I also found this issue which has some working and non-working examples according to the issue author, https://github.com/hashicorp/go-getter/issues/387.
Do any of the examples in the issue above work for your use case?
Hi @cbruno10
path |
plugin-*.log content upon steampipe query call |
note |
---|---|---|
paths = [ "xxx123-xxxxx.s3.amazonaws.com/AAAaaaa.csv?aws_profile=aa"] |
2023-04-06 07:22:41.292 UTC [ERROR] steampipe-plugin-csv.plugin: [ERROR] csv.csvList: failed to fetch absolute path="failed to get directory specified by the source xxx123-xxxxx.s3.amazonaws.com/AAAaaaa.csv?aws_profile=aa: error downloading 'https://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration" path=xxx123-xxxxx.s3.amazonaws.com/AAAaaaa.csv?aws_profile=aa \n 2023-04-06 07:22:41.302 UTC [WARN] failed to set connection config: failed to get directory specified by the source xxx123-xxxxx.s3.amazonaws.com/AAAaaaa.csv?aws_profile=aa: error downloading 'https://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration |
makes somewhat sense but this path style should resolve to us-east-1 |
paths = [ "xxx123-xxxxx.eu-west-1.s3.amazonaws.com/AAAaaaa.csv?aws_profile=aa"] |
2023-04-06 07:30:18.327 UTC [ERROR] steampipe-plugin-csv.plugin: [ERROR] csv.csvList: failed to fetch absolute path="failed to get directory specified by the source xxx123-xxxxx.eu-west-1.s3.amazonaws.com/AAAaaaa.csv?aws_profile=aa: URL is not a valid S3 URL" path=xxx123-xxxxx.eu-west-1.s3.amazonaws.com/AAAaaaa.csv?aws_profile=aa \n 2023-04-06 07:30:18.340 UTC [WARN] failed to set connection config: failed to get directory specified by the source xxx123-xxxxx.eu-west-1.s3.amazonaws.com/AAAaaaa.csv?aws_profile=aa: URL is not a valid S3 URL |
??? |
paths = [ "s3.eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa"] |
2023-04-06 07:48:52.432 UTC [ERROR] steampipe-plugin-csv.plugin: [ERROR] csv.csvList: failed to fetch absolute path="failed to get directory specified by the source s3.eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa: error downloading 'https://eu-west-1.amazonaws.com/s3/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration" path=s3.eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa \n 2023-04-06 07:48:52.445 UTC [WARN] failed to set connection config: failed to get directory specified by the source s3.eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa: error downloading 'https://eu-west-1.amazonaws.com/s3/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration |
??? |
paths = [ "s3://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa"] |
2023-04-06 07:41:43.493 UTC [ERROR] steampipe-plugin-csv.plugin: [ERROR] csv.csvList: failed to fetch absolute path="failed to get directory specified by the source s3://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa: error downloading 's3://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration" path=s3://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa \n 2023-04-06 07:41:43.523 UTC [WARN] failed to set connection config: failed to get directory specified by the source s3://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa: error downloading 's3://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration |
makes somewhat sense but this path style should resolve to us-east-1 |
paths = [ "s3://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa"] |
2023-04-06 07:38:50.159 UTC [ERROR] steampipe-plugin-csv.plugin: [ERROR] csv.csvList: failed to fetch absolute path="failed to get directory specified by the source s3://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa: error downloading 's3://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration" path=s3://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa \n 2023-04-06 07:38:50.196 UTC [WARN] failed to set connection config: failed to get directory specified by the source s3://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa: error downloading 's3://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration |
??? |
paths = [ "s3::https://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa"] |
2023-04-06 07:43:58.893 UTC [ERROR] steampipe-plugin-csv.plugin: [ERROR] csv.csvList: failed to fetch absolute path="failed to get directory specified by the source s3::https://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa: error downloading 'https://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration" path=s3::https://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa \n 2023-04-06 07:43:58.920 UTC [WARN] failed to set connection config: failed to get directory specified by the source s3::https://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa: error downloading 'https://s3.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration |
makes somewhat sense but this path style should resolve to us-east-1 |
paths = [ "s3::https://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa"] |
2023-04-06 07:47:21.064 UTC [ERROR] steampipe-plugin-csv.plugin: [ERROR] csv.csvList: failed to fetch absolute path="failed to get directory specified by the source s3::https://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa: error downloading 'https://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration" path=s3::https://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa \n 2023-04-06 07:47:21.081 UTC [WARN] failed to set connection config: failed to get directory specified by the source s3::https://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa: error downloading 'https://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration |
??? |
paths = [ "s3::https://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa"] |
2023-04-06 07:47:21.064 UTC [ERROR] steampipe-plugin-csv.plugin: [ERROR] csv.csvList: failed to fetch absolute path="failed to get directory specified by the source s3::https://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa: error downloading 'https://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration" path=s3::https://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa \n 2023-04-06 07:47:21.081 UTC [WARN] failed to set connection config: failed to get directory specified by the source s3::https://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa: error downloading 'https://s3-eu-west-1.amazonaws.com/xxx123-xxxxx/AAAaaaa.csv?aws_profile=aa': MissingRegion: could not find region configuration |
??? |
The transformation of CRLF to \n
are mine to allow formatting in table.
From these tests it feels like if aws_profile
is specified, then the logic for URL parsing to determine the region is not used at all.
But overall from a end-user perspective, all these URI schemes seems odd ; the AWS console gives for an S3 object two URI schemes to access an object, and none of them are part of the examples of the lib go-getter :
(bucket name in green, object name in red)
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
Hey @ajoga , sorry for the long response time!
I had another dive into this issue and what formats were supported by go-getter, but I walked away even less confident than when I started.
Most of our examples and testing in https://hub.steampipe.io/plugins/turbot/csv#accessing-a-private-bucket were based off of https://github.com/hashicorp/go-getter#s3-bucket-examples, but we admittedly ran into some questions/issues along the way:
We didn't find their documentation very helpful, and it seems like other users of that package also had questions around what format to use, e.g., https://github.com/hashicorp/go-getter/issues/387.
In our SDK, we do some S3 path handling, but I'm not sure this is the cause of the errors, as in the plugin log error messages you've included above, the URLs look like they match what you have in your paths
config argument.
If we have any examples or key information missing in our doc based off of your tests, can you please raise a PR adding this information (which we could then push to other plugin docs that support retrieving files with go-getter)?
I think performing exhaustive tests is a bit of a blackhole based on the lack of guidance from the go-getter package and the large number of possible URL and query param combinations, so it may be better to provide examples that work consistently in our docs.
If you have any other questions or thoughts, please let us know!
@ajoga As a continuation of @cbruno10's message above, you can also refer to our unit tests, where we have a few different path formats defined which you may try out. Thanks!
@Subhajit97 From the unit tests you linked, are any of those worth adding into our docs?
@cbruno10 IIRC, in our docs, we prefer a consistent path format that works with private and public buckets. The above unit tests are mostly targeted to a public S3 bucket so that we can test those and go-getter fails for some of them if it is a private bucket.
We can add a few of them to provide the format the go-getter support, for example:
"s3::https://demo-integrated-2022.s3.ap-southeast-1.amazonaws.com/ghost/Dockerfile"
"demo-integrated-2022.s3-ap-southeast-1.amazonaws.com/ghost//Dockerfile"
@Subhajit97 Do those 2 formats work with private S3 buckets? If so, how would I pass in authentication information, e.g., the profile name?
@cbruno10, both the format mentioned above works with private S3 buckets. For authentication, the profile name can be mentioned in the paths as defined in the plugin docs.
For example:
connection "csv" {
plugin = "csv"
paths = [
"deletebucket12092023.s3-us-east-1.amazonaws.com/CSVs//*.csv?aws_profile=default"
]
}
Hey @ajoga , thanks again for doing some extensive testing earlier.
For now, we'd recommend using one of the formats that @Subhajit97 had mentioned in https://github.com/turbot/steampipe-plugin-csv/issues/57#issuecomment-1705249483 along with AWS profile credentials.
In terms of figuring out which S3 URL formats go-getter accepts, we found it difficult based on lack of documentation/examples from the go-getter repository and related docs. Exhaustively trying to test them all (like you've done) is time consuming and produces some unexpected results/errors, so in general, we don't try to use exhaustive testing, but instead use the known working formats (which usually include the region in the URL).
If these formats do not work for you though, please let us know and we can dig into these specifically.
Thanks!
Is your feature request related to a problem? Please describe. The documentation states :
I'm confused as to why this is needed as the region is in the hostname of the S3 path to the file or folder.
The documentation suggests the use of AWS profiles, so if one were to have csv files in two regions, he'd have to configure two different AWS profiles for Steampipe, which is at odd with most of other tooling using AWS credentials.
The documentation also suggest to pass the
region
but it's not in the exemple and I can't get it to work, passingregion
,aws_region
parameter is not recognized:Setting the region corresponding to the bucket location in
~/.aws/credentials
(region=eu-west-1
) works.Setting the incorrect region in
~/.aws/credentials
yields to this error upon Steampipe invokation:BucketRegionError: incorrect region, the bucket is not in 'eu-central-1' region
Describe the solution you'd like I think this feature should not expect a region to be given anyhow, worst case scenario it can be parsed from the hostname : https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html
versions plugin csv 0.7.0 Steampipe v0.19.3