lyft / cartography

Cartography is a Python tool that consolidates infrastructure assets and the relationships between them in an intuitive graph view powered by a Neo4j database.
https://lyft.github.io/cartography/
Apache License 2.0
2.96k stars 330 forks source link

Error while enabling GSuite integration #245

Open marco-lancini opened 4 years ago

marco-lancini commented 4 years ago

Issue: The documentation for setting up the GSuite integration is incomplete and leads towards a 400 error

Description: Following the README, I:

  1. Enabled Google API access (instructions)
  2. Created a new G Suite user account and accept the Terms of Service (this account will be used as the domain-wide delegated access).
  3. Performed G Suite Domain-Wide Delegation of Authority, as explained here
  4. Downloaded the service account's credentials
  5. Setup env vars for cartography
    GSUITE_GOOGLE_APPLICATION_CREDENTIALS - location of the credentials file.
    GSUITE_DELEGATED_ADMIN - email address that you created in step 2

After this, Cartography crashes with the following output:

cartography --neo4j-uri bolt://localhost:7687 --neo4j-user neo4j --neo4j-password-env-var NEO4J_PASSWORD_ENV_VAR
INFO:cartography.sync:Starting sync with update tag '1580915718'
INFO:cartography.sync:Starting sync stage 'create-indexes'
INFO:cartography.intel.create_indexes:Creating indexes for cartography node types.
INFO:cartography.sync:Finishing sync stage 'create-indexes'
INFO:cartography.sync:Starting sync stage 'aws'
...
INFO:cartography.sync:Finishing sync stage 'aws'
INFO:cartography.sync:Starting sync stage 'gcp'
...
INFO:cartography.sync:Finishing sync stage 'gcp'
INFO:cartography.sync:Starting sync stage 'gsuite'
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
ERROR:cartography.sync:Unhandled exception during sync stage 'gsuite'
Traceback (most recent call last):
  File "/app/cartography/sync.py", line 69, in run
    stage_func(neo4j_session, config)
  File "/app/cartography/intel/gsuite/__init__.py", line 79, in start_gsuite_ingestion
    api.sync_gsuite_users(session, resources.admin, config.update_tag, common_job_parameters)
  File "/app/cartography/intel/gsuite/api.py", line 230, in sync_gsuite_users
    resp_objs = get_all_users(admin)
  File "/app/cartography/intel/gsuite/api.py", line 109, in get_all_users
    resp = request.execute(num_retries=GOOGLE_API_NUM_RETRIES)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/http.py", line 856, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://www.googleapis.com/admin/directory/v1/users?customer=my_customer&maxResults=500&orderBy=email&alt=json returned "Invalid Input">
Traceback (most recent call last):
  File "/usr/local/bin/cartography", line 11, in <module>
    load_entry_point('cartography', 'console_scripts', 'cartography')()
  File "/app/cartography/cli.py", line 240, in main
    return CLI(default_sync, prog='cartography').main(argv)
  File "/app/cartography/cli.py", line 220, in main
    return cartography.sync.run_with_config(self.sync, config)
  File "/app/cartography/sync.py", line 135, in run_with_config
    return sync.run(neo4j_driver, config)
  File "/app/cartography/sync.py", line 69, in run
    stage_func(neo4j_session, config)
  File "/app/cartography/intel/gsuite/__init__.py", line 79, in start_gsuite_ingestion
    api.sync_gsuite_users(session, resources.admin, config.update_tag, common_job_parameters)
  File "/app/cartography/intel/gsuite/api.py", line 230, in sync_gsuite_users
    resp_objs = get_all_users(admin)
  File "/app/cartography/intel/gsuite/api.py", line 109, in get_all_users
    resp = request.execute(num_retries=GOOGLE_API_NUM_RETRIES)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/http.py", line 856, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://www.googleapis.com/admin/directory/v1/users?customer=my_customer&maxResults=500&orderBy=email&alt=json returned "Invalid Input">

I started troubleshooting this and I found the string my_customer is hardcoded in cartography/intel/gsuite/api.py:

request = admin.users().list(customer='my_customer', maxResults=500, orderBy='email')

I decided to replace it with the customerId of my GSuite org, and then I faced a 403 - Not Authorized error:

cartography --neo4j-uri bolt://localhost:7687 --neo4j-user neo4j --neo4j-password-env-var NEO4J_PASSWORD_ENV_VAR
INFO:cartography.sync:Starting sync with update tag '1580916464'
INFO:cartography.sync:Starting sync stage 'create-indexes'
INFO:cartography.intel.create_indexes:Creating indexes for cartography node types.
INFO:cartography.sync:Finishing sync stage 'create-indexes'
INFO:cartography.sync:Starting sync stage 'aws'
...
INFO:cartography.sync:Finishing sync stage 'aws'
INFO:cartography.sync:Starting sync stage 'gcp'
...
INFO:cartography.sync:Finishing sync stage 'gcp'
INFO:cartography.sync:Starting sync stage 'gsuite'
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:googleapiclient.http:Encountered 403 Forbidden with reason "forbidden"
ERROR:cartography.sync:Unhandled exception during sync stage 'gsuite'
Traceback (most recent call last):
  File "/app/cartography/sync.py", line 69, in run
    stage_func(neo4j_session, config)
  File "/app/cartography/intel/gsuite/__init__.py", line 79, in start_gsuite_ingestion
    api.sync_gsuite_users(session, resources.admin, config.update_tag, common_job_parameters)
  File "/app/cartography/intel/gsuite/api.py", line 230, in sync_gsuite_users
    resp_objs = get_all_users(admin)
  File "/app/cartography/intel/gsuite/api.py", line 109, in get_all_users
    resp = request.execute(num_retries=GOOGLE_API_NUM_RETRIES)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/http.py", line 856, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/admin/directory/v1/users?customer=<REDACTED>&maxResults=500&orderBy=email&alt=json returned "Not Authorized to access this resource/api">
Traceback (most recent call last):
  File "/usr/local/bin/cartography", line 11, in <module>
    load_entry_point('cartography', 'console_scripts', 'cartography')()
  File "/app/cartography/cli.py", line 240, in main
    return CLI(default_sync, prog='cartography').main(argv)
  File "/app/cartography/cli.py", line 220, in main
    return cartography.sync.run_with_config(self.sync, config)
  File "/app/cartography/sync.py", line 135, in run_with_config
    return sync.run(neo4j_driver, config)
  File "/app/cartography/sync.py", line 69, in run
    stage_func(neo4j_session, config)
  File "/app/cartography/intel/gsuite/__init__.py", line 79, in start_gsuite_ingestion
    api.sync_gsuite_users(session, resources.admin, config.update_tag, common_job_parameters)
  File "/app/cartography/intel/gsuite/api.py", line 230, in sync_gsuite_users
    resp_objs = get_all_users(admin)
  File "/app/cartography/intel/gsuite/api.py", line 109, in get_all_users
    resp = request.execute(num_retries=GOOGLE_API_NUM_RETRIES)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/http.py", line 856, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/admin/directory/v1/users?customer=<REDACTED>&maxResults=500&orderBy=email&alt=json returned "Not Authorized to access this resource/api">

I tried to add more scopes to the service account (listed below), but I still get Not Authorized:

  1. https://www.googleapis.com/auth/admin.directory.domain.readonly
  2. https://www.googleapis.com/auth/admin.directory.group.member.readonly
  3. https://www.googleapis.com/auth/admin.directory.user.readonly

Please complete the following information::

achantavy commented 4 years ago

@skiptomyliu Can you take a look at this GSuite issue when you get a moment?

skiptomyliu commented 4 years ago

the my_customer refers to the current customer/org that the application belongs to. https://developers.google.com/admin-sdk/directory/v1/guides/manage-customers , but replacing it with your customerId should also work.

I would still double check that the steps have been completed on your step 3: ("G Suite Domain-Wide Delegation of Authority").

You should have two accounts:

  1. A service account that is created in GCP
  2. An e-mail account that is created in GSuite that the service account you created in GCP will delegate to.
marco-lancini commented 4 years ago

Hi @skiptomyliu, thanks for replying.

I've made sure to have:

  1. A service account in GCP
  2. An email account in GSuite for delegation

We also tried applying these scopes:

The error we are facing now is the following:

INFO:cartography.sync:Starting sync stage 'gsuite'
INFO:googleapiclient.discovery:URL being requested: GET https://www.googleapis.com/discovery/v1/apis/admin/directory_v1/rest
DEBUG:cartography.intel.gsuite.api:Syncing GSuite Users
INFO:googleapiclient.discovery:URL being requested: GET https://www.googleapis.com/admin/directory/v1/users?customer=my_customer&maxResults=500&orderBy=email&alt=json
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.client:Failed to retrieve access token: {
  "error": "unauthorized_client",
  "error_description": "Client is unauthorized to retrieve access tokens using this method, or client not authorized for any of the scopes requested."
}
ERROR:cartography.sync:Unhandled exception during sync stage 'gsuite'
Traceback (most recent call last):
  File "/app/cartography/sync.py", line 69, in run
    stage_func(neo4j_session, config)
  File "/app/cartography/intel/gsuite/__init__.py", line 68, in start_gsuite_ingestion
    api.sync_gsuite_users(session, resources.admin, config.update_tag, common_job_parameters)
  File "/app/cartography/intel/gsuite/api.py", line 229, in sync_gsuite_users
    resp_objs = get_all_users(admin)
  File "/app/cartography/intel/gsuite/api.py", line 108, in get_all_users
    resp = request.execute(num_retries=GOOGLE_API_NUM_RETRIES)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/http.py", line 851, in execute
    method=str(self.method), body=self.body, headers=self.headers)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/http.py", line 165, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/oauth2client/transport.py", line 159, in new_request
    credentials._refresh(orig_request_method)
  File "/usr/local/lib/python3.7/site-packages/oauth2client/client.py", line 749, in _refresh
    self._do_refresh_request(http)
  File "/usr/local/lib/python3.7/site-packages/oauth2client/client.py", line 819, in _do_refresh_request
    raise HttpAccessTokenRefreshError(error_msg, status=resp.status)
oauth2client.client.HttpAccessTokenRefreshError: unauthorized_client: Client is unauthorized to retrieve access tokens using this method, or client not authorized for any of the scopes requested.
Traceback (most recent call last):
  File "/usr/local/bin/cartography", line 11, in <module>
    load_entry_point('cartography', 'console_scripts', 'cartography')()
  File "/app/cartography/cli.py", line 241, in main
    return CLI(default_sync, prog='cartography').main(argv)
  File "/app/cartography/cli.py", line 221, in main
    return cartography.sync.run_with_config(self.sync, config)
  File "/app/cartography/sync.py", line 135, in run_with_config
    return sync.run(neo4j_driver, config)
  File "/app/cartography/sync.py", line 69, in run
    stage_func(neo4j_session, config)
  File "/app/cartography/intel/gsuite/__init__.py", line 68, in start_gsuite_ingestion
    api.sync_gsuite_users(session, resources.admin, config.update_tag, common_job_parameters)
  File "/app/cartography/intel/gsuite/api.py", line 229, in sync_gsuite_users
    resp_objs = get_all_users(admin)
  File "/app/cartography/intel/gsuite/api.py", line 108, in get_all_users
    resp = request.execute(num_retries=GOOGLE_API_NUM_RETRIES)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/http.py", line 851, in execute
    method=str(self.method), body=self.body, headers=self.headers)
  File "/usr/local/lib/python3.7/site-packages/googleapiclient/http.py", line 165, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/oauth2client/transport.py", line 159, in new_request
    credentials._refresh(orig_request_method)
  File "/usr/local/lib/python3.7/site-packages/oauth2client/client.py", line 749, in _refresh
    self._do_refresh_request(http)
  File "/usr/local/lib/python3.7/site-packages/oauth2client/client.py", line 819, in _do_refresh_request
    raise HttpAccessTokenRefreshError(error_msg, status=resp.status)
oauth2client.client.HttpAccessTokenRefreshError: unauthorized_client: Client is unauthorized to retrieve access tokens using this method, or client not authorized for any of the scopes requested.
skiptomyliu commented 4 years ago

Hrmm, I wonder if we mixed up the steps as it appears ordering may be required...

https://stackoverflow.com/a/59067603/914941

"Delegating domain-wide authority to the service account" MUST be enabled before you add service account and its scopes on "Manage API client access" page in G Suite Admin. Otherwise it will fail with "Client is unauthorized to retrieve access tokens using this method, or client not authorized for any of the scopes requested." error and require removing the API client and adding it again.

marco-lancini commented 4 years ago

Hi @skiptomyliu, just to be super sure we tried both ways and we are still facing the same issue

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

marco-lancini commented 4 years ago

We are still blocked by this issue

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

marco-lancini commented 4 years ago

Hi stale bot, this is still an ongoing issue :)

marco-lancini commented 4 years ago

Hi, some updates on this:

So it really seems this issue is related to some calls made by Cartography, rather than to a faulty setup?

zamirTo1 commented 4 years ago

Hi, Is anyone managed to overcome HTTP error 403 "Not Authorized to access this resource/api"?

marco-lancini commented 4 years ago

Unfortunately no, I had to put on hold the GSuite integration for my use cases 😟

achantavy commented 4 years ago

Ah man, I wonder what secret sauce we've got going on because this just works™️ on our deployment.

@zamirTo1 do you get the same error messages as @marco-lancini? Can you add more details?

zamirTo1 commented 4 years ago

Hi @achantavy I do get the same error as @marco-lancini, I've also tried to build the GSuite section by myself according to this guide: https://developers.google.com/admin-sdk/directory/v1/guides/delegation same result :(

marco-lancini commented 4 years ago

Yeah can confirm I followed that process as well, and got it working for RBACSync. This leads me to think it might be something related to Cartography code itself 🤔

marco-lancini commented 4 years ago

I've documented the approach I took, step-by-step, here: https://www.marcolancini.it/2020/blog-gsuite-domain-delegation/ Hope this helps!

jychp commented 1 year ago

This https://github.com/lyft/cartography/pull/1071 does not solve the issue but allow to use an other auth method (OAuth) that can be considered as a workaround.