Closed montmanu closed 5 years ago
@nolanmar511 can you take a look ?
@JustinBeckwith -- Most of @google-cloud/profiler
's authentication is handled through @google-cloud/common
(and, then through google-auth-library
). The keyFilename
field of @google-cloud/profiler
's options comes from GoogleAuthOptions
. Do you know what could be happening?
(Possibly interestingly, GoogleAuthOptions
has both a keyFile
and a keyFilename
field, which surprised me a bit.)
👋 @montmanu can you provide a code snippet of how you're trying to use the profiler?
Sure thing!
Here is an overview of how we went about integrating the profiler:
gcloud services enable cloudprofiler.googleapis.com
npm install --save @google-cloud/profiler
// ...
import * as profilerAgent from '@google-cloud/profiler';
// ...
/**
* keyFilename == "/etc/secrets/sd-profiler-agent-key.json"
* logLevel == 4
* projectId == "my-project-id"
* serviceContext.service == "renderer"
* serviceContext.version == "bffadda8ab32b1a236bfeb9456fa43c5308a2597"
*/
import profilerAgentOptions from './config/observability/profiler-agent';
// ...
profilerAgent.start(profilerAgentOptions);
// ...
Regarding the host environment, the container is built using node:8-alpine@sha256:8e9987a6d91d783c56980f1bd4b23b4c05f9f6076d513d6350fef8fe09ed01fd
as the base image. That base image is extended with the following utilities:
# ...
RUN \
apk add --update --no-cache bind-tools curl alpine-sdk
# ...
Here is the relevant npm install log output:
> pprof@0.2.0 install /dist/node_modules/pprof
> node-pre-gyp install --fallback-to-build
node-pre-gyp WARN Using request for node-pre-gyp https download
[pprof] Success: "/dist/node_modules/pprof/build/node-v57-linux-x64-musl/pprof.node" is installed via remote
> protobufjs@6.8.8 postinstall /dist/node_modules/protobufjs
> node scripts/postinstall
The Service Account's key is stored in a Kubernetes secret and mounted into the container as a volume. Here is a selection from a sample Pod configuration:
apiVersion: v1
kind: Pod
metadata:
labels:
app: renderer
cluster: us-east1
env: stg
namespace: default
project: hybrid
revision: bffadda
name: renderer-65c9b59b55-wnwk6
namespace: default
spec:
containers:
- env:
# ...
- name: CLOUD_PROFILER_KEY_FILE
value: /etc/secrets/sd-profiler-agent-key.json
# ...
name: renderer
# ...
volumeMounts:
- mountPath: /etc/secrets
name: secrets
readOnly: true
# ...
volumes:
# ...
- name: secrets
secret:
defaultMode: 420
secretName: renderer-secrets
# ...
I have validated the contents of /etc/secrets/sd-profiler-agent-key.json
on the file system within a running container.
The Service Account has the following IAM Roles applied:
Let me know if you need any additional information.
not sure if it is relevant, but we are using a few other APM agents:
/**
* This module should be loaded at the application's entry point
* order matters here ...
*
* 1. trace agent
* 2. profiler agent
* 3. error reporting agent
* 4. debug agent
*/
import * as traceAgent from '@google-cloud/trace-agent';
import * as profilerAgent from '@google-cloud/profiler';
import { ErrorReporting } from '@google-cloud/error-reporting';
import * as debugAgent from '@google-cloud/debug-agent';
// ...
traceAgent.start(traceAgentOptions);
// ...
if (true) {
profilerAgent.start(profilerAgentOptions);
}
// ...
if (true) {
errorReportingAgent = new ErrorReporting(errorReportingOptions);
}
// ...
if (true) {
debugAgent.start(debugAgentOptions);
}
// ...
This looks like it should work. @nolanmar511 I can't seem to npm install
on OSX, so it's very hard for me to test this :/
For OSX, a few additional dependencies are required, but the profiling agent should still work. https://github.com/nodejs/node-gyp#installation
Starting to experiment with this.
To test, I had two projects (I'll call them A and B). I created a key for project A to use Stackdriver Profiler's agents. I then created a GCE VM in project B and ran some Node.js with the profiling agent.
So, snippet for starting the profiling agent:
require('@google-cloud/profiler'). start({
keyFilename: "sd-profiler-key-for-project-A.json",
projectId: "project-id-for-profile-a",
serviceContext: { service: "service"},
logLevel: 4,
});
With this, I was able to collect and upload profiles from project B's GCE VM into project A. So, "keyFilename" does work with profiler.
I'm a bit puzzled. @google-cloud/profiler
, @google-cloud/trace
, and @google-cloud/debug
all use @google-cloud/common
in the same way for authentication (and the latest version of @google-cloud/profiler
and @google-cloud/trace
both depend on @google-cloud/common
version 0.31.X). So, this would have to be a GKE/profiler specific problem, and I don't quite see how that would happen.
Next step is to try this on GKE.
Thanks for digging in. I started to try out using the google-auth-library
directly with a limited test case.. something like the following after (kubectl exec
ing into a running container) ..
/** @see https://github.com/googleapis/google-auth-library-nodejs/blob/master/samples/keyfile.js */
const {auth} = require('google-auth-library');
/**
* Acquire a client, make a request to an API that the Service Account has permissions to access
*/
(function (){
async function main(keyFile) {
const client = await auth.getClient({
keyFile: keyFile,
scopes: 'https://www.googleapis.com/auth/monitoring',
});
const projectId = await auth.getProjectId();
const url = `https://cloudprofiler.googleapis.com/v2/profiles`;
const res = await client.request({url});
console.log('Profiler Info:');
console.log(res.data);
}
main(process.env.CLOUD_PROFILER_KEY_FILE).catch(console.error);
})();
I'm sort of guessing wrt the actual Profiler API request details.. only had time to track down the baseUrl
value for the API .. I tried sending a few GET requests with several variations in the path / params / etc, but was unable to successfully list any profiles.. all 4xx..
I haven't been able to reproduce this on GKE when specifying keyFilename
and trying to upload to the same project or when specifying keyFilename
and trying to upload to a different project.
I have reproduced the error message (Error: The caller does not have permission
) when the key file isn't right (for example, when I tried to use a key created for with the role of Stackdriver Profiler User instead of the role Stackdriver Profiler Agent; or when I tried to use a key made for project A to upload to project B).
It's possible I just haven't figured out how to reproduce this, but I'd like to rule out other potential issues.
Is it possible the key file isn't associated with the project you're trying to upload profiles to, or possible that the service account doesn't have the Stackdriver Profiler Agent role?
thanks again!
Unfortunately, the key file appears to be correct and the service account appears to have the Stackdriver Profiler Agent role applied :/
kubectl exec -it renderer-5d74495b6f-pchkg -c renderer -- cat /etc/secrets/sd-profiler-agent-key.json
{
"type": "service_account",
"project_id": "my-project-id",
"private_key_id": "SNIP",
"private_key": "-----BEGIN PRIVATE KEY-----\nSNIP\n-----END PRIVATE KEY-----\n",
"client_email": "sd-profiler-agent@my-project-id.iam.gserviceaccount.com",
"client_id": "SNIP",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/sd-profiler-agent%40my-project-id.iam.gserviceaccount.com"
}
Agree with you though.. its strange that the other APM libraries authenticate successfully using the metadata service defaults and that this library had problems.. especially if they are sharing the same underlying auth dependencies.
This particular project may have been an EAP participant for the Profiler product.. is it possible that something about that EAP participation is affecting the use of the GA API?
fwiw, this same configuration is working as expected in a different project (different clusters, different Service Account, and therefore, different key.. but same IAM policy, APM integration details).. the project where this is working would likely not have been an EAP participant which may give a bit more weight to that EAP related hypothesis..
@aalexand -- Could a project having been part of EAP impact authentication?
@nolanmar511 I can't think of how it could.
@montmanu -- you indicated only the Compute Engine default service account appears to be interacting with the API. Was it unexpected that the Compute Engine default service account interacted with the API? Could something else be using that token?
Based on my experiments, my assumption is that the profiling agent is using the file specified by keyFilename
, but that that key file doesn't grant the necessary permissions.
Would it be possible to delete and re-create the key file?
Another possible guess:
Based on you comments above, it looks like you specify the projectId
in the configuration. But, just in case that's not the case, specifying the projectId in the configuration (and, to be overly-specific, ensuring that that project id matches the project id in the key file) might help.
Similarly, specifying the exact project ID in your example using google-auth-library, rather than using await auth.getProjectId()
could be helpful...
I mention this because if, somehow, the project id specified in the configuration and the project id in the key file don't match, the " The caller does not have permission" definitely appears.
ok thanks. yes, i can definitely delete / re-create that key and re-test. will also confirm that the projectId is correct. will follow up once that is complete.
So I have not yet had a chance to delete / re-create the SA key, but I have noticed new error messages related to auth in a couple of the other APM agents being used:
@google-cloud/debug-agent Failed to re-register debuggee 163243153602: Error: Unexpected error determining execution environment: request to http://metadata.google.internal./computeMetadata/v1/instance failed, reason: getaddrinfo EAI_AGAIN metadata.google.internal.:80
ERROR:@google-cloud/error-reporting: Unable to find credential information on instance. This library will be unable to communicate with the Stackdriver API to save errors. Message: Unexpected error determining execution environment: request to http://metadata.google.internal/computeMetadata/v1/instance/ failed, reason: getaddrinfo EAI_AGAIN metadata.google.internal:80
This cluster has node auto-updates enabled, so the cluster details have changed slightly from when this issue was created:
{
"currentMasterVersion": "1.12.7-gke.7",
"currentNodeVersion": "1.12.7-gke.7",
"initialClusterVersion": "1.8.4-gke.0",
"location": "us-east1-b",
"locations": [
"us-east1-b",
"us-east1-c",
"us-east1-d"
],
"loggingService": "logging.googleapis.com/kubernetes",
"monitoringService": "monitoring.googleapis.com/kubernetes",
"nodeConfig": {
"oauthScopes": [
"https://www.googleapis.com/auth/bigquery",
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/cloud.useraccounts",
"https://www.googleapis.com/auth/cloud.useraccounts.readonly",
"https://www.googleapis.com/auth/cloud_debugger",
"https://www.googleapis.com/auth/compute",
"https://www.googleapis.com/auth/compute.readonly",
"https://www.googleapis.com/auth/datastore",
"https://www.googleapis.com/auth/devstorage.full_control",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/devstorage.read_write",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/pubsub",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/source.full_control",
"https://www.googleapis.com/auth/source.read_only",
"https://www.googleapis.com/auth/sqlservice",
"https://www.googleapis.com/auth/sqlservice.admin",
"https://www.googleapis.com/auth/taskqueue",
"https://www.googleapis.com/auth/trace.append",
"https://www.googleapis.com/auth/userinfo.email"
]
}
}
namely, currentMasterVersion
and currentNodeVersion
have both changed to 1.12.7-gke.7
@montmanu -- Have you had a chance to re-create the SA key? Also, should this be moved to google-cloud/common, or google-auth-library if authentication is impacting multiple agents?
At this point, I'm closing this issue.
I don't think it's actionable for profiler without further information, and sounds like the problem may not be profiler-specific.
Feel free to re-open with additional context.
It appears that the
keyFilename
configuration option is ignored when attempting to authenticate to the Profiler API on GKE.When viewing the API usage by Credential, only the
Compute Engine default service account
appears to be interacting with the API.The agent is logging error messages similar to the following:
More details: