googleapis / google-cloud-python

Google Cloud Client Library for Python
https://googleapis.github.io/google-cloud-python/
Apache License 2.0
4.83k stars 1.52k forks source link

Build out some integration with Cloud Bigtable #872

Closed jgeewax closed 9 years ago

jgeewax commented 9 years ago

We want to make users of Happybase "just work", so...some ideas:

Option 1: A wrapped import module

from gcloud.bigtable import happybase

Option 2: Some sort of monkey patching?

from gcloud import bigtable
import happybase
happybase = bigtable.monkey_patch(happybase)
tbetbetbe commented 9 years ago

@nathanielmanistaatgoogle will be back from holiday next week, and will follow-up on my comments then.

At the moment, the python auth example is missing, there's a PR to get that fixed In the meantime, please see how auth + secure access is done in the python interop test client

In particular, see

Notes

dhermes commented 9 years ago

Thanks @tbetbetbe

dhermes commented 9 years ago

FYI, prod certificates require the SSL_CERT_FILE environment variable to be set (which isn't done by default on Linux AFAICT), so I'm not sure how that works for the typical user. I adapted a ruby checker for SSL_CERT_FILE and ran it on my Ubuntu 14.04 install and a default GCE instance. Both find the SSL_CERT_FILE at /etc/ssl/certs/ca-certificates.crt.

dhermes commented 9 years ago

@tbetbetbe Just got a chance to implement the auth stuff (in my gist) and still getting a request timeout.

@nathanielmanistaatgoogle Can we get together this week?

dhermes commented 9 years ago

Status update: Meeting with Nathaniel today. Hopefully we can iron out issues with request failures.

dhermes commented 9 years ago

I declare victory! Just leaving GOOG SFO after spending 3.5 hours with @nathanielmanistaatgoogle

We were banging our head against the wall for awhile and before giving up, fired up the go client (docs there could be better).

Eventually, we realized two things

I was trying to do the simplest thing, ListZones with the cluster API, but it was not possible since the API is private (though the photos protos are released publicly)

In addition, the go error messages (from the server) were much more useful than those from gRPC in Python (not sure why).

I am headed to gym right now but look forward to getting this working soon.

lesv commented 9 years ago

The cluster admin API is public -- but it needs to be enabled separately.

dhermes commented 9 years ago

Sorry, I should have said that I couldn't enable it from the APIs console from a public account. Beyond that I don't know.

@lesv how does one enable the API for a project?

lesv commented 9 years ago

cloud.google.com/console select your project, then select APIs & Auth > APIs and type in 'bigtable', you should see two apis listed, you need to enable both of them.

dhermes commented 9 years ago

Yes I know, they are the Bigtable API (just for data within an existing table or tables) and the Table Admin API (for creating and deleting tables).

There is a 3rd API, the Cluster Admin API (I said "ListZones with the cluster API" above). This is the one I said isn't public.

UPDATE: Sorry for previous typos, was on my phone

lesv commented 9 years ago

Odd, I'm able to use the gcloud alpha bigtable clusters list & gcloud alpha bigtable clusters create from my personal account, with only the two APIs enabled on a project. Both of them work and it's written in Python.

You can view the source on your machine google-cloud-sdk/lib/googlecloudsdk/bigtable/commands/clusters/* it's python code.

bendemaree commented 9 years ago

My understanding with the Cloud Platform APIs that are alpha is that the project has to be whitelisted to access them (or even enable them). For example, alpha commands in the Compute Engine API are not available and I don't have an option to enable them.

Edit: my "understanding" is from here.

lesv commented 9 years ago

The API's are ALPHA and public / published. Try doing gcloud components update followed by gcloud alpha bigtable clusters list --project <projectID> -- if it works, you'll have your answer.

I did this on my personal account and got good results. That doesn't appear to be accurate for this, but you may need to: gcloud components list, and gcloud components update alpha first.

dhermes commented 9 years ago

@lesv I was currently in the process of this with --log-http. That doesn't mean ListZones will work, but ListClusters works and that is under the cluster admin API. I also noticed that ?alt=json is appended to the URI.

It'd be great if someone would document the API surface, because right now I'm feeling around in the dark. I had to go through the source for the scopes and the service endpoints.

There are clearly 3 services:

and the two APIs that can be enabled seem to correspond to only two of those three (to an outsider, which is all users)

screen_shot_004

Another distinction may be a user account vs. a service account.

Yet another may be the scopes used. Checking the access token (which I got from --log-http), none of the BigTable scopes are used

https://www.googleapis.com/auth/appengine.admin
https://www.googleapis.com/auth/bigquery
https://www.googleapis.com/auth/compute
https://www.googleapis.com/auth/devstorage.full_control
https://www.googleapis.com/auth/userinfo.email
https://www.googleapis.com/auth/ndev.cloudman
https://www.googleapis.com/auth/cloud-platform
https://www.googleapis.com/auth/sqlservice.admin
https://www.googleapis.com/auth/prediction
https://www.googleapis.com/auth/projecthosting
https://www.googleapis.com/auth/plus.me

but the catch-all https://www.googleapis.com/auth/cloud-platform is.


I'll report back with what I find about adding ?alt=json and similar

lesv commented 9 years ago

I sent a note to engineering earlier when you said it doesn't work. I seem to recall that both admin API's are enabled by a single flag for simplicity, but I'm confirming it. The bigtable scopes work, but cloud-platform is preferred -- we wanted to simplify things. But it also makes it asymmetric for a while which confuses.

bendemaree commented 9 years ago

@lesv You were correct about the clusters list request; thanks!

dhermes commented 9 years ago

@lesv TLDR; It seems the issue is using a service account rather than a user account.


I set some environment variables for config

PROJECT_ID="some-id-from-console"
BEARER_TOKEN="oauth2-token"  # Can get user token via --log-http output

With the same project ID I made two different cluster admin requests (ListClusters and ListZones) using three different tokens


To send the requests, I did as follows (using the env. vars from above):

$ # ListClusters
$ curl \
> --header "Authorization: Bearer ${BEARER_TOKEN}" \
> --header "accept: application/json" \
> --header "accept-encoding: gzip, deflate" \
> --header "content-length: 0" \
> --header "user-agent: dhermes-curl-bigtable-test" \
> https://bigtableclusteradmin.googleapis.com/v1/projects/${PROJECT_ID}/aggregated/clusters?alt=json
$
$ # ListZones
$ curl \
> --header "Authorization: Bearer ${BEARER_TOKEN}" \
> --header "accept: application/json" \
> --header "accept-encoding: gzip, deflate" \
> --header "content-length: 0" \
> --header "user-agent: dhermes-curl-bigtable-test" \
> https://bigtableclusteradmin.googleapis.com/v1/projects/${PROJECT_ID}/zones?alt=json

Using the user account the response for ListClusters was

{}

and for ListZones

{
  "zones": [
    {
      "name": "projects/REDACTED/zones/asia-east1-b",
      "displayName": "asia-east1-b",
      "status": "OK"
    },
    {
      "name": "projects/REDACTED/zones/europe-west1-c",
      "displayName": "europe-west1-c",
      "status": "OK"
    },
    {
      "name": "projects/REDACTED/zones/us-central1-c",
      "displayName": "us-central1-c",
      "status": "OK"
    },
    {
      "name": "projects/REDACTED/zones/us-central1-b",
      "displayName": "us-central1-b",
      "status": "OK"
    }
  ]
}

On the other hand both methods failed for the service account (no matter which scopes were used) with the message

{
  "error": {
    "code": 403,
    "message": "Project has not enabled the API. Please use Google Developers Console to activate the API for your project.",
    "status": "PERMISSION_DENIED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.Help",
        "links": [
          {
            "description": "Google developer console API activation",
            "url": "https://console.developers.google.com/project/REDACTED/apiui/api"
          }
        ]
      }
    ]
  }
}

We have to encourage a service account to be used for deployed applications. That is the point of a service account. So these endpoints need to work for service accounts.

lesv commented 9 years ago

Agree - it shouldn't work the way you describe. I'm not even sure where to send this, but I'll mention it to the eng team. @jgeewax may have some insight.

jgeewax commented 9 years ago

@carterpage @coryoconnor -- Any ideas on why service accounts are no good?

dhermes commented 9 years ago

@jgeewax Small distinction: service accounts work (i.e. they are "good") for the Table Admin API and Data API, just not the Cluster Admin API (here is where they are "no good").

coryoconnor commented 9 years ago

@dmmcerlean @sduskis any ideas?

lesv commented 9 years ago

Try with one of the following scopes: https://www.googleapis.com/auth/cloud-bigtable.admin https://www.googleapis.com/auth/cloud-bigtable.admin.cluster

dhermes commented 9 years ago

Trying only one at a time as wells as both at once, it still fails with the message

Project has not enabled the API. Please use Google Developers Console to activate the API for your project.


If you like, I've made an all-inclusive gist that makes this easy to run and tweak: https://gist.github.com/dhermes/d27070c90a9862213a3b

The script main_with_cluster_admin.go uses bigtable.ClusterAdminScope. This value (defined in doc.go) is slightly different than the one you suggested:

https://www.googleapis.com/auth/bigtable.admin.cluster        # In GOLANG
https://www.googleapis.com/auth/cloud-bigtable.admin.cluster  # Your suggestion

You can run it with

make list_clusters

for a service account or

make list_clusters USE_APP_DEFAULT=True

for using the user creds from gcloud login.

dhermes commented 9 years ago

Here is a well-polished jumping off point for Python: https://gist.github.com/dhermes/2edb97d9581b5ec471eb

Will merge this upstream into https://github.com/dhermes/gcloud-python-bigtable

lesv commented 9 years ago

@AngusDavis -- Please take a look.

lesv commented 9 years ago

I've talked with engineering -- it is probably best not to provide the cluster APIs for now.

jgeewax commented 9 years ago

That doesn't quite make sense. Why not?

lesv commented 9 years ago

gcloud, when acquiring OAuth tokens, use the client id and client secret (from the installed application OAuth flow). This results in gcloud being able to use the API, but not tokens associated with service accounts.

When the Cluster API is fully enabled, it will be usable programmatically. Until then, the recommendation is to use gcloud or cloud console for the infrequent case of creating a cluster and the data and table APIs for the more frequent cases.

So basically, the API is visible to some internal projects and not to everyone else. We're trying to get this exposed via a service acct, it's currently not... hang tight

dhermes commented 9 years ago

Hanging tight; from the outside perspective, service accounts should be prioritized over user accounts since service accounts are used in a deployed application.

dhermes commented 9 years ago

OK here is a new one, I can't even create a cluster through the API cloud console anymore:

screen_shot_006

Note the "Table Admin API" service is enabled on the project

screen_shot_007

and as mentioned before there is no Cluster Admin API as a choice to enable:

screen_shot_008

lesv commented 9 years ago

I've reported it to engineering.

dhermes commented 9 years ago

Thanks. I think it's just my personal (@gmail.com) account. @jgeewax was able to create one through the UI with a @google.com account.

lesv commented 9 years ago

Yep -- it fails on my personal gmail account when I'm not on the corporate network as well.

lesv commented 9 years ago

Fix is rolling out now, should be done in the next few hours.

dhermes commented 9 years ago

Fix for UI or fix for supporting service accounts with Cluster Admin API?

lesv commented 9 years ago

At this point, it should be just like things were last week. We rolled back some changes.

dhermes commented 9 years ago

Gotcha. Thanks.

dhermes commented 9 years ago

@lesv ISTM that it became possible to use service accounts for the Cluster Admin API yesterday. Can you confirm? (Though the API still doesn't show up in the list of APIs to enable on cloud console.)

lesv commented 9 years ago

That might have happened as part of the fix for the problem you found recently -- the engineer I need to talk w/ is out this week -- so, don't take it as a given that it will stay this way. Eventually, it will be correct and you can use them -- but I'm not sure it has happened officially yet.

lesv commented 9 years ago

@dhermes @jgeewax The Cluster Admin API is definitely not supposed to be enabled yet. The cluster proto was open sourced for some testing reasons, but wasn't supposed to be made available externally yet. One reason is that there are some naming issues that have come up, and the proto may have to be changed because of this. That conversation won't take place until next week and we probably won't see any public results for at least a week after that.

dhermes commented 9 years ago

@lesv Thanks. I have the entire Cluster Admin API implemented, including


Though it's not supposed to be enabled yet, any changes will be quickly picked up by the 3 suites mentioned above.

Thanks for the heads up though.


PS ISTM that supporting cloud alpha bigtable clusters * would be impossible without enabling the API.

dhermes commented 9 years ago

I have everything implemented except for the stream requests (ReadRows and SampleRowKeys), which I'm not sure how to handle with gRPC Python.

Golang handles it by using a cancel-able stream.

nathanielmanistaatgoogle commented 9 years ago

@dhermes: what's unclear? gRPC Python supports cancellation. (Ping me; I have some time open this afternoon if you'd like to videochat.)

dhermes commented 9 years ago

Nothing unclear (that I know of), I just don't know how to do it. Let's chat.

jgeewax commented 9 years ago

@dhermes : Looks like we're "done enough" with the Bigtable stuff. I'm going to close this out and open a new issue focused on "what exactly do we need to do to make gcloud_bigtable become gcloud.bigtable"

Sound reasonable? (Re-open if not)