Open bhtucker opened 4 years ago
Just for testing, I created a config
directory inside the arthur-redshift-etl
directory. Then I built a minimal set of config files.
Here's a PR that makes this easier: #241
mkdir config
export DATA_WAREHOUSE_CONFIG=`pwd`/config
cp etc/aws_template.yaml config/aws.yaml
cp etc/warehouse_template.yaml config/warehouse.yaml
cp etc/credentials.sh.template config/credentials.sh
Take a look at the templates. The aws.yaml
config file needs to be updated based on outputs from the CloudFormation stack. Looking at the config file posted, it just might be easier than you remember. We've made some improvements.
Starting Arthur now:
bin/run_arthur.sh
This will show some settings. Take a look at the rest:
arthur.py settings
And now make sure that S3 has the ETL code:
upload_env.sh
Without changes to the template this fails, of course, but you'll update aws.yaml
.
Finding bucket name and prefix in configuration...
An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
Check whether the bucket "object-store" exists and you have access to it!
Then create a table design file:
arthur.py bootstrap_sources webapp
This also fails because you need a credentials file with the connections, see prompts in config/credentials.sh
.
After you've setup the credentials (with connection strings), don't forget to copy the file to s3.
Once you have a design file, upload the local schemas to S3:
arthur.py sync --deploy-config
And now run one of:
arthur.py extract
install_extraction_pipeline.sh
Let me know which hurdles you encounter and I'll try to get them resolved.
Thank you for the guidance!
This worked great. The one hiccup was: credentials.sh
seems to be uploaded out-of-band, right? Neither upload_env
nor arthur.py sync
end up copying it?
Yes, unfortunately. You'll have to create and upload the credentials.sh
file manually.
Summary
I'm trying to set up a fresh project and wonder if there are any templates for the 'sibling' repo. (I have the fortunate position of vaguely remembering how this should work, and still I'm stuck!)
By banging my head against the validator, I eventually came up with a dummy warehouse config (uselessly passes the validator):
Now I need to set up my prefix, with e.g. bootstrapping scripts as well as
sync
output. I guess this isupload_env.sh
?Anyway, if I'm missing existing assets, I'd love to use them -- and if not, it would be good to know, so I can write down what I do!
Details
At the moment I'm just trying to use
extract
.Labels Please set the label on the issue so that
I don't think I have 'edit' rights on the labels