taskcluster / taskcluster-rfcs

Taskcluster team planning
Mozilla Public License 2.0
11 stars 19 forks source link

Private Task Logs #159

Closed djmitche closed 4 years ago

djmitche commented 4 years ago

I'd also love feedback from @escapewindow and @armenzg.

Note the link to https://bugzilla.mozilla.org/show_bug.cgi?id=1598689 which has a bunch of additional background.

djmitche commented 4 years ago

Also @ajvb. We should think through the security implications of allowing LiveLog artifacts to be overwritten.

escapewindow commented 4 years ago

It looks like you're getting rid of live_backing.log, correct?

Some thoughts:

Overall, 👍

djmitche commented 4 years ago

It sounds like we should support task.extra.logs.taskLogArtifact in scriptworker, for consistency.

Producing logs using this new convention is probably a good idea. I'm not sure if scriptworker will need to make changes in its verification process to use this new approach.

I think we'll want to check for any artifact path conflicts (e.g., if the taskLogArtifact and a non-log-artifact want to publish to the same path) and throw a malformed-payload or the like.

Those won't be known until the end of the task in many cases (e.g., if uploading a directory with artifact prefix public/, the conflict only occurs if a file named logs/live.log exists when the task completes), so malformed-payload may not be the right exception reason, but we should detect this case and handle with a clear error message.

We chose not to support live logs in scriptworker because of security implications. Have we resolved those? If so, we may want to revisit that decision. If not, we'll probably continue uploading a single taskLogArtifact at the end of the task.

I don't know what those security implications were, but this RFC doesn't change them. That seems a fine approach.

If we implement this alongside #158, we can choose to upload the S3 artifact taskLogArtifact with checksums. If we do so, we need to stop writing to the log when we calculate the checksum, and we may lose information about any following steps, including uploads. This artifact would then satisfy the requirement of a verifiable log, and we could eliminate certified.log.

I think the "final" log would need to be uploaded after all other artifacts are uploaded. I don't think that's a new issue in this RFC, but bears mentioning.

djmitche commented 4 years ago

@ajvb and I talked for a bit, about (a) live logs still being insecure (obscure URL is not sufficient) and (b) default value for artifact name.

For (a), I'll update the proposal to disable live-logging when a non-public log artifact is used. This avoids anyone making a careful decision as to whether they trust the obscure URL -- the answer is just "no", you can't use live-logs if you want private logs.

We can later figure out a way to authenticate live logging -- whether that be by the workers accepting Hawk credentials, websocktunnel accepting Hawk credentials, or some other method -- and once that's in place we can allow live-logging with private logs.

For (b), we went a little further afield.

@ajvb noted that it's weird that we use the name of the artifact to determine visibility, rather than having a "public" property of each artifact. Indeed, that is weird.

@ricky26 suggested in chat last week that a good way to get to private-by-default may be to create a new role, maybe called any-request, that is implicitly assumed on every request (including for unauthenticated users) and then update every currently-unauthenticated API method to require scopes. Deployments that wish to remain mostly-public, like Firefox-CI and Community, can then add those new scopes to the new role and things will continue to work as before.

Putting those two together, we could eventually get to a point where a deployment can be "locked down" and specifically logs can be private are the default by not adding queue:get-artifact:public/* to the any-request role.

Until then, private logs are a bit of a "boutique feature" and need to be handled with caution.

All in all, this lets us keep this RFC very narrowly scoped without blocking making things private by default in the future and without risking disclosure of live logs.

djmitche commented 4 years ago

And just to set expectations -- once this is decided it will be a while until we can get development time on it. But it does divide nicely into smaller pieces and some of those pieces might be practical tasks for folks outside the team.

djmitche commented 4 years ago

The motivation for this RFC is to have task logs that are private, that UIs and log consumers can identify as the main log. So it sounds like this particular RFC doesn't address the use-case to which you're referring, but doesn't preclude it, either.

We certainly don't have bandwidth on the team to address the larger scope.

So I guess the question is, given that this is of minimal scope and seemingly not useful for fuzzing, is it useful at all?

petemoore commented 4 years ago

So I guess the question is, given that this is of minimal scope and seemingly not useful for fuzzing, is it useful at all?

@jschwartzentruber what do you think?

I think the ideal approach would be that we have a streaming artifact type. Then there would be no reason for us to limit this to a single artifact, or to being a log file. The task inspector could preview e.g. up to a limit of 3 streaming artifacts, vertically split, or give you the option to choose which streaming artifacts you wish to preview. The streaming artifacts could serve as a pipe between tasks (see https://github.com/taskcluster/taskcluster-rfcs/issues/121#issue-331961067 for a little background on this idea - section "Example future use case"). I think this is a bigger win that implementing something that assumes tasks have a single streaming artifact that happens to be a log, and that all users should see that one streaming artifact by default. But of course, that is a bigger piece of work. So my preference is that we don't add a private log feature if you can't stream it, since that feature already exists by redirecting output to a file, and adding another feature to do the same would complicate matters by having multiple ways to do the same thing.

djmitche commented 4 years ago

Thanks for the comments. So, I think the useful forms of this idea are out of reach for the team and its current workload -- so far out of reach that it's not a great use of time for us to design a solution at this point.

jschwartzentruber commented 4 years ago

Sorry for the late replay, but I agree both that private logs should be streaming to be useful, and that a solution isn't necessary if it's too much work. It would be nice to have this in TC, but our tasks are logging to stackdriver in GCP, which wasn't difficult to set up.