Closed jblakley closed 6 years ago
Yes, this is already possible!
Check out this old PR which describes how to use it: https://github.com/scanner-research/scanner/pull/93
Hi @jblakley, was this issue resolved for you?
Alex, no, not yet. I'm still working. I'm getting this error:
Connecting to Scanner database...
Running Scanner job...
Traceback (most recent call last):
File "myexample.py", line 70, in
It does seem to be accessing AWS. It is creating db_metadata.bin in the bucket/scanner_db but the file is empty. I'm suspecting some permission error.
Here's my config.toml: [storage] type = "s3" bucket = "s3-scanner-utilities-1" db_path = "scanner_db" region = "us-east-1" endpoint = "s3.us-east-1.amazonaws.com"
any suggestions appreciated.
BTW, the other folders that are present on GCP (jobs/, table_megafile.bin, tables/) do not exist in the S3 version.
Alex, I finally have this working. After much mucking around with .toml, and .yaml files, keys, scannerdb and buckets, I finally have everything moved over to AWS. For reference, there were 4 things that had to change:
Once I'd done that most of the issue was some database conflict between what was in my client's scanner_db, the S3 bucket, the GCP Bucket and the k8s nodes. I eventually solved it by copying the GCP scanner_db to S3. But, I think a simpler way would be to start with a new client container, a clean k8s scanner service and an empty S3 scanner_db.
BTW, another hint for others trying to get the k8s example going in AWS. This may be obvious to k8s and scanner experts but for a newbie like me:
You have to run the k8s application from a scanner client and that client has to have access to the VPC the k8s cluster is in. In my case, I created a client ubuntu instance in the VPC on AWS, brought up a scanner container there and ran all the apps from there. Probably easiest to do it from a local ubuntu client but I didn't have one handy.
Jim, that's great! Apologies I couldn't have been more help, I hadn't run Scanner on AWS Kubernetes before a few days ago (I just started helping some folks internally at FB setup a Scanner deployment on AWS using Kubernetes). Thanks for doing some pathfinding.
You're right that our Kubernetes support is a little ad-hoc right now and so:
I'm writing a script now for deploying on AWS that will hopefully deal with all the issues you experienced in getting setup. It would be great if you'd take a look once I've pushed it!
One question: I have been using the cloud-specific tools for setting up kubernetes clusters (collection of cloudformation scripts for AWS, GKE for GCP), but it seems like kops is an alternative to that. What's your experience been with kops? Would you recommend it?
Alex, I’m happy to share my messy, custom and hardcoded bash scripts for starting k8s using kops and for building and starting scanner service in that cluster.
WRT to kops, as a k8s newbie, I tried several things to get a cluster up with scanner:
In the end, kops was the best for what I was trying to do. It gave better transparency and control in starting up clusters and was simpler to use than the AWS alternatives. I could start up and delete clusters pretty easily with an understandable bash script. Main limitation was that it was mostly command line driven rather config file driven. There has been some work on a config file approach for kops but I couldn’t get it to work. It also has an option to specify the VPC on startup that didn’t seem to work. I am also not sure how well it would scale to more complex clusters.
For the others, the Heptio cluster didn’t give a lot of transparency – you can get in and tune the cloudformation files but that was more complicated than I wanted for simple clusters. I got a cluster running with simple service but never got scanner running before I abandoned for kops.
When EKS launched, I expected to shift to that but it took me down a rathole of IAM permissions and I gave up and went back to kops.
Minikube was fine but was local and I really didn’t need a local cluster.
BTW, I also needed to make a change in the k8s example.py in scanner. That example uses ExternalIP. In AWS, I need to use InternalIP when finding the Master IP.
Not sure if possible to store scanner_db in AWS rather than in GCP . I.e., in the config.toml file in the kubernetes example. If not, would be nice it were.