Closed philbooth closed 5 years ago
So I've deployed the current version to GCP, but based on some console.logs I added it looks like the function gets invoked with individual events, which I wasn't expecting. I was hoping they'd come in batches which we could pass on to Amplitude in batches too.
I've deleted the cloud function for now so it's not spinning overnight, and will come back to this again in the morning.
Oh man, that's annoying. I wonder why it doesn't support any batching...
it looks like the function gets invoked with individual events
I wonder why it doesn't support any batching...
So, looking at the docs I think maybe it does, but we have to configure that explicitly ourselves.
It's not completely clear, but see for example:
Okay, this script is in a working state now, in so much as it:
The events also show up in Amplitude itself, here's a chart showing the new OAuth activity events landing:
However, the logs also show there are a number of invocations that timeout:
D fxaAmplitudeImport 536316489295563 2019-05-01 08:43:16.186 Function execution took 60002 ms, finished with status: 'timeout'
E fxaAmplitudeImport 2019-05-01 08:43:16.187 Error: function execution attempt timed out. Function invocation was interrupted.
These occur without tripping any of the error/debug logging that I've set up, so I don't know where or why they're failing yet. It could be throttling at the Amplitude end, although I'd expect to see error responses for that.
Anyway, I've deleted the cloud function again for now to prevent it spinning pointlessly. The next thing we need to do is look into whether we can sort out batching on the pub/sub end (and maybe that will have an impact on the timeouts too, who knows).
@jbuck I'm on PTO for a few days now, but if you have a chance to look into batching on the pub/sub end (see comment above) at some point, I'd be interested to know what you find out. And if we can't batch at that end, I guess we need to think about what the best approach is instead.
I looked into setting up batching somehow, but it’s not possible to change batch settings for PubSub Push (1 message only) or GCS export (all messages over 1 hour, partitioned by $something).
I talked to :jason and he recommended switching to PubSub Pull, where you can manually control PubSub batch settings by message count or byte size.
:pb and I chatted in the Operations meeting and he'll change this PR to use PubSub Pull instead. I will setup something to run this on (probably k8s)
:pb and I chatted in the Operations meeting and he'll change this PR to use PubSub Pull instead.
Done!
Note that we still receive individual events when pulling from pubsub, so I've retained the dependency on async.cargo
to group them into batches of (up to) ten. There is a WORKER_COUNT
environment variable to control how much concurrency we do that with, if you want to do all the concurrency at the container level then set that to 1
.
The other environment variable that might not be obvious is the value of IGNORED_EVENTS
. Based on info from @shane-tomlinson, I believe that should be:
{"fxa_activity - access_token_checked":[{"event_properties":{"oauth_client_id":"1b1a3e44c54fbb58"}},{"event_properties":{"oauth_client_id":"5882386c6d801776"}},{"event_properties":{"oauth_client_id":"3332a18d142636cb"}}],"fxa_activity - access_token_created":[{"event_properties":{"oauth_client_id":"1b1a3e44c54fbb58"}},{"event_properties":{"oauth_client_id":"5882386c6d801776"}},{"event_properties":{"oauth_client_id":"3332a18d142636cb"}}]}
When you run the script successfully the output should look something like this:
~/c/fxa-amplitude-send (pb/pubsub) $ node pubsub
2019-05-07T19:00:50.278Z success, count = 10
2019-05-07T19:00:51.241Z success, count = 2
2019-05-07T19:00:51.426Z success, count = 10
2019-05-07T19:00:52.106Z success, count = 10
2019-05-07T19:00:52.283Z success, count = 10
2019-05-07T19:00:52.519Z success, count = 10
2019-05-07T19:00:52.668Z success, count = 6
I was running against the stage project in Amplitude, and you can see those events arriving here:
https://analytics.amplitude.com/mozilla-corp/chart/ixwqt13
I'm out at the cricket tomorrow, but hopefully that's enough info to deploy this in my absence. If there's any problems, let me know and I can dig into them on Thursday.
(note, I just force-pushed that last commit again, the message.ack()
/ message.nack()
logic had a stupid bug that I hadn't noticed)
@jbuck, it occurred to me that this would be much easier to deploy if I kept the pubsub stuff in a separate directory with it's own package.json
, so you don't get bogged down with building that nightmare node-parquet
dependency inside the container. So I just pushed a commit containing that change, fyi.
@philbooth @jbuck is this the last blocker to getting the oauth events into amplitude? it would be good to have this ASAP because it may affect some trailhead-related metrics
is this the last blocker to getting the oauth events into amplitude?
Afaik, yep.
it would be good to have this ASAP because it may affect some trailhead-related metrics
:+1: we spoke about it in the Ops meeting just now, it's still being worked on so 🤞
This is deployed in stage now, getting the following errors:
2019-05-24T22:14:36.748Z https://api.amplitude.com/identify StatusCodeError: 429 - "too_many_requests"
at new StatusCodeError (/home/node/fxa-amplitude-send/node_modules/request-promise-core/lib/errors.js:32:15)
at Request.plumbing.callback (/home/node/fxa-amplitude-send/node_modules/request-promise-core/lib/plumbing.js:104:33)
at Request.RP$callback [as _callback] (/home/node/fxa-amplitude-send/node_modules/request-promise-core/lib/plumbing.js:46:31)
at Request.self.callback (/home/node/fxa-amplitude-send/node_modules/request/request.js:185:22)
at Request.emit (events.js:196:13)
at Request.<anonymous> (/home/node/fxa-amplitude-send/node_modules/request/request.js:1161:10)
at Request.emit (events.js:196:13)
at IncomingMessage.<anonymous> (/home/node/fxa-amplitude-send/node_modules/request/request.js:1083:12)
at Object.onceWrapper (events.js:284:20)
at IncomingMessage.emit (events.js:201:15)
at endReadableNT (_stream_readable.js:1130:12)
at processTicksAndRejections (internal/process/task_queues.js:84:9)
followed by:
2019-05-24T22:14:51.542Z https://api.amplitude.com/httpapi TypeError: Cannot destructure property `message` of 'undefined' or 'null'.
at /home/node/fxa-amplitude-send/index.js:268:11
at Array.forEach (<anonymous>)
at clearMessages (/home/node/fxa-amplitude-send/index.js:265:11)
at /home/node/fxa-amplitude-send/index.js:116:7
but it does appear to be sending!
Hm, after that last error it appeared to stop sending though. I decided to change the WORKER_COUNT to 1 and there are no rate-limiting errors, and it's chewing through everything quite nicely
@jbuck I've set the default worker count to 1 in the script and added a condition so that it handles those 429 errors gracefully if they do occur. Is that what you wanted, or were you hoping for a smarter division of work based on device id so that we could use a greater number of workers?
Fixes #89. Fixes mozilla/fxa#866.
Ports the existing main import script to node and slaps a pubsub-friendly interface on it.
Still in progress/untested. Doesn't yet include @shane-tomlinson's logic for ignoring configured client ids.