openaddresses / batch

OpenAddresses/Machine based AWS Batch based ETL Processing
https://batch.openaddresses.io/
MIT License
6 stars 5 forks source link

Can't see task log #322

Closed bertday closed 1 year ago

bertday commented 1 year ago

Describe the bug I'm not able to see the task log for any jobs. For instance:

https://batch.openaddresses.io/job/314761/log

To Reproduce Steps to reproduce the behavior:

  1. Go to https://batch.openaddresses.io/job/314761/log

I've tried much older jobs and gotten the same behavior:

https://batch.openaddresses.io/job/25884/log

Expected behavior Expecting to see logging statements

Screenshots image

Desktop (please complete the following information):

Additional context

I'm generally on a VPN. I've tried going off the VPN and am getting the same result.

bertday commented 1 year ago

A few bookmarks as I'm looking into this...


If I make a log request like this:

curl -i https://batch.openaddresses.io/api/job/317083/log

I'm getting back a 200 response with an empty JSON array of []. That's the same as what I'm seeing in the browser.

The error handling in both the route and Job.log looks like it should be returning an HTTP error if anything goes wrong, including the log not existing. So this all seems to be following a happy path.

@iandees would it be possible to check the log for a sample job (say 317083) and see if it's actually empty in AWS? That should help rule out if there's an issue I'm not seeing in the log-fetching code, or if it's an upstream issue of the logs not being written properly.

iandees commented 1 year ago

I don't see an AWS Batch execution for that job. There is a 1 week log retention, but there are some other tasks around it that are still in there, so it seems like it failed before the Batch job was submitted.

bertday commented 1 year ago

Hmm, thanks for taking a look @iandees. What's an example of a job that did execute? Wondering if the task log will show up if you plug one of those job IDs into this route:

https://batch.openaddresses.io/job/:id

Although it's still odd there's no Batch execution for it 🤔 I can see the data OK:

https://batch.openaddresses.io/job/317083

bertday commented 1 year ago

Curious observation:

Around the same time #332 was merged I randomly tried grabbing the task log for a recent job (321818) and it worked 🎉 About a day later I'm trying to see that same log and it's back to not showing up... however another recent job is showing logs (321863).

I'm wondering if there's something that would cause logs to get purged after a day or so... or anything else that would make this a timing issue? When I opened this ticket it seemed like logs were never showing up, so wondering if #332 may have caused an incremental fix.

iandees commented 1 year ago

The AWS Batch behind that job 321818 still exists and the logs for it are still present in our account. I noticed when trying to view the logs from the AWS console that it's default time filter was preventing it from finding anything, so maybe the OpenAddresses' Batch system is using some incorrect default for fetching logs from AWS? I will check after work.

iandees commented 1 year ago

It looks like a single call to the GetLogEvents endpoint (as we do here) doesn't return log lines when the job is quite old. Looking at the AWS CloudWatch Logs console, they're doing a bunch of API calls forward and back with the paging tokens until they find events. I'm not sure how they know to stop, though.

I'm still looking for a decent way to find all events for a log stream.

iandees commented 1 year ago

It looks like if I set the startFromHead option to true (it's false by default), then I can get the logs more reliably.

iandees commented 1 year ago

Ok, viewing logs should be much more reliable now (when we have the logs). I bumped the retention up to 18 months, so the logs for more recent jobs should be visible more reliably.

bertday commented 1 year ago

Awesome! So glad to hear the logs are back and better than ever 💯 Many thanks @iandees