tukaani-project / xz

XZ Utils
https://tukaani.org/xz/
Other
503 stars 40 forks source link

Response to backdoor incident #103

Closed thesamesam closed 1 month ago

thesamesam commented 3 months ago

A backdoor was discovered in xz-5.6.0 and xz-5.6.1 by Andres Freund.

Updates are being posted at https://tukaani.org/xz-backdoor/ and the links at the bottom of the page.

Please be patient while everything is cleaned up. It will take time. In particular, please avoid speculation here which would be better placed in other venues.

gbraad commented 3 months ago

thanks was added in https://github.com/tukaani-project/xz/commit/db4dd74a344580e0b81436598d9741a3454245b0 and removal/reponse in https://github.com/tukaani-project/xz/commit/e93e13c8b3bec925c56e0c0b675d8000a0f7f754

Paladynee commented 3 months ago

Sam "thesamesam" James: FAQ on the xz-utils backdoor (CVE-2024-3094)

cybik commented 3 months ago

Copying my comment over on the commit: https://github.com/tukaani-project/xz/commit/e93e13c8b3bec925c56e0c0b675d8000a0f7f754#commitcomment-140893111

For the record, while I applaud everyone keeping a cool head and responding to this issue promptly,
the fact of the matter is that these files still "exist" within the history of this repository.

At the risk of breaking build automata everywhere, may I suggest abusing git filter-branch to purge
the backdoor payloads from this repository permanently?

The backdoor files are harmless without inserting a trigger code when a release tarball is created.

@Larhzu except someone could fork this repository, add a new feature on top of xz, package it as "something else entirely", get it added to distributions with enough of a refined social attack, and cut a version with the backdoor resurrected. At that point, the attack vector would be back into play.

Or someone could just copy the attack vector into a completely different package altogether and adapt it somehow.

I get what you're saying, it's probably overkill, I'm probably paranoid. At the same time, is it really paranoia if I can already imagine ways that this attack vector could be resurrected without much effort? If I, a random user of the package, can just ramble about these, I'm unfortunately assuming the original attackers can already think of far worse.

Larhzu commented 3 months ago

So someone could fork the repository, re-apply the reverted backdoor commits, and add a trigger back too? Yes. But one could just add fresh backdoor files too, or copy the old ones from somewhere else. The other copies won't disappear even if 200 commits are rebased here.

It's not paranoia if they really are after you. Or after the project you maintain. ;-)

A more real reason to do the rebase is to avoid the bad files causing trouble with antivirus software. I don't know how big problem that is though.

cybik commented 3 months ago

Fair enough!

At the very least your repository would have less chances of being pointed at :P

anarazel commented 3 months ago

So someone could fork the repository, re-apply the reverted backdoor commits, and add a trigger back too? Yes. But one could just add fresh backdoor files too, or copy the old ones from somewhere else. The other copies won't disappear even if 200 commits are rebased here.

I agree that rebasing/filtering doesn't seem worth the cost. However, it may be worth to remove the backdoored release tags and add them back under a different name, to avoid tooling from automatically using them. I'd go for something like backdoored-vX.Y.Z, so they're clearly in a different namespace.

borestad commented 3 months ago

Just my 5 cents ....but AFAIK tools like bfg does use a mix of old sha's and new sha's.

This makes it possible to delete/clean only the effected commits and keep all original sha's

https://stackoverflow.com/a/49866781

The BFG avoids all this by rewriting the original references without keeping backups in refs/original/ (and is of course much faster and more convenient than git filter-branch). Nonetheless, it's still effectively copying all the original commits to new ones. Your copied repository is, in effect, a mostly-new repository, which should never be mixed with the old one.

eli-schwartz commented 3 months ago

Just my 5 cents ....but AFAIK tools like bfg does use a mix of old sha's and new sha's.

This makes it possible to delete/clean only the effected commits and keep all original sha's

That is not how BFG works, and even your quote is actually saying that BFG avoids making your repository twice as big, because it deletes the old history instead of keeping it around as a backup branch.

git is an immutable append-only tree. A newer commit cannot keep the same sha1 unless it is based on the original commit containing the backdoor. This is an extremely important core security guarantee of git, with the unfortunate side effect that you cannot forge a fake commit that claims to be a newer commit but is actually a rebased one, even if you're the legal owner of the repository and you really have a good reason to want to do so.

mathstuf commented 2 months ago

If history is rewritten[^rewrite], one can use git-replace to map old hashes to new hashes.

[^rewrite]: if my vote counts for anything, I don't think it should be; removing all of this from the public record makes it harder to discuss as a historical incident in the future.

xodiumluma commented 2 months ago

I noticed that there were no code reviews. Jia Tan just committed directly. Is this a weakness...? 🤔

thesamesam commented 2 months ago

Reviews can be (and were) done in places other than github PRs.

xodiumluma commented 2 months ago

Thanks @thesamesam. Most of the other projects have GitHub-visible PRs with practices like signed commits. I think it's nice at least from a visibility perspective. Is this something we can move towards? Or is there somewhere else where we already have this in place?

Larhzu commented 2 months ago

Until 2024, I reviewed majority of the commits that affected files under "src". A lot of such review was interactively done over Signal or IRC (that is, text-based chat). Code was pushed to a development branch on GH and, after a few rounds of feedback and force-pushes to the same branch, it was merged to master.

Some of the test program commits (tests/.c, tests/.sh) I didn't review thoroughly back in the day. The files under ".github" I didn't review at all; I haven't studied how the CI stuff works. My understanding was and is that those files won't affect anything else in the Git repository or release packages.

In 2024 I felt that Jia was finally competent enough to not require so much oversight (with 5.4.x releases I did release tarball comparisons as Jia had some trouble getting them right at first). The malicious test files (tests/files/*) all came in 2024. The five odd test files were added right before 5.6.0 release when I wasn't home. 5.6.0 was the first release I permitted Jia to do alone. I was supposed to focus on XZ for Java after XZ Utils 5.6.0.

I'm a chat person. Mailing lists and GH posts take more time to achieve the same thing, that is, chat is more efficient to me.

I plan to explain more about the story later. The priority right now is to review the Git repository with the mindset "is there anything malicious hiding". That is a natural thing to do when reviewing patches from random contributors but patches from an active co-maintainer don't get the "is this intentionally malicious" treatment after a while. Or perhaps I'm weird and others remaing suspicious of their co-maintainers even after two years of co-operation (even if the personal relationship isn't the best).

Anyway, the misinformed complaints how Jia supposedly had a free hand in the repository aren't helping the project. Yes, in 2024 he had a lot more freedom but 2023 wasn't like that. Yes, many things could have been done and could still be done better but lots of advice tend to appear a little late; I'm sure that only a few months earlier, many were happy how much Jia had been helping to make the project progress.

xodiumluma commented 2 months ago

Thank you so much @Larhzu - really appreciate your leadership and reply.

eli-schwartz commented 2 months ago

Thanks @thesamesam. Most of the other projects have GitHub-visible PRs with practices like signed commits. I think it's nice at least from a visibility perspective. Is this something we can move towards? Or is there somewhere else where we already have this in place?

Note that signed commits are largely incompatible with GitHub PRs, since GitHub PRs imply merging the PRs on the GitHub web UI and that in turn means that you cannot sign the commits you merged. It's possible as a project maintainer to submit a PR and then merge it via fast-forward on the command line, in which case the same commit sha1 is present both in the PR and in the repository's master branch, and GitHub will also mark the PR as merged.

https://github.com/orgs/community/discussions/6414

mathstuf commented 2 months ago

You can also merge locally, sign the merge commit, and push that. The PR's HEAD commit being reachable from its target branch will mark it as merged (squash merging is out-of-luck though).

eli-schwartz commented 2 months ago

Yeah, that's a subcategory of the case I mentioned. Unfortunately that does require using merge commits which not everyone wants to be limited to... The critical bit here is you're constrained to merging in the exact commit sha1 from the PR branch.

Many people, myself included, prefer to rebase merge (or cherry-pick merge, if you prefer to think of it that way) and consider squash merging to be a horrifying sin. :p

Maybe someday GitHub will implement the feedback in question, though I wouldn't hold my breath for it.

mathstuf commented 2 months ago

GitLab has a way to force "allow maintainer to push updates" to allow for rebase-merge to be done without PR author participation. Not sure if Github does (unlikely given the general disdain for force-push in the general Github workflow).

FWIW, I also agree with you on squash-merge being a sin, but some projects just cannot seem to have good commit hygiene from contributors and/or are unwilling to require rebasing fixes into introducing commits and just throw in the towel with squash-merges-by-default. Rebase-ff-merge is…OK, but I like having the merge commit as a place for topic-level metadata such as "who merged it", an easy handle to revert as a unit, and places for Reviewed-by-like trailers, PR information, etc.

coderhisham commented 2 months ago

Please Remove the releases by Jia Tan. and create a new release

JohnDoe1001 commented 2 months ago

Have you considered reverting all changes yet? If so, then ty for your effort. I hope that the Linux community become stronger and more secure, especially against doomsday vulnerabilities like this.

Btw could I become a document maintainer?

Larhzu commented 2 months ago

Rebasing

The repository will not be rebased. The malicious files have been reverted. Even if one checks them out, they are harmless without the trigger code that was never committed to the Git repository. Thus things like git bisect are safe to use.

Merges

I don't do merges via GitHub UI. I do merges locally as then the same process works no matter where the Git repository is hosted. The commit 97f0ee0f1f903f4e7c4ea23e9b89d687025d2992 I merged locally without rebasing, thus it shows GitHub as the committer. I might try to avoid this in the future.

If I understand it correctly, signing commits without merge commits doesn't sound impractical: Rebase the new commits locally first. Adding Closes: <URL-to-PR> to a commit message will close the PR and, even better, this leaves a link to the commit log that points to the PR discussion. Pushing to the submitter's branch (if allowed) before merge could make sense too but I don't know how much it matters if GitHub doesn't show the PR as merged. The discussion should make things clear anyway.

Force pushes

@ItzSwirlz You commented in PR #95 about force pushes.

I manually mirror to git.tukaani.org where I have never force pushed. Apart from the very early days on GitHub, when the primary repository was still on git.tukaani.org and the repository on GitHub wasn't officially announced anywhere yet, there were no force pushes to the master or v5.x branches on GitHub, or at least I didn't detect any. If they happened, they must have happened so that I never fetched the thrown-away commits.

To the xz-java repository on GitHub I had to force push after my account was reinstated because I didn't have a local copy of Jia's last commits (that created SECURITY.md). I had already pushed new commits to git.tukaani.org while the repository on GitHub was unavailable to me.

Spam problem

@JohnDoe1001: This kind of behavior isn't acceptable and may earn you a ban instead of maintainer status.

The amount of spam on GitHub is distracting from useful activities. There are many places where people can make humor-only posts about the recent events. Everyone, please keep those posts out of GitHub (assuming that you aren't in the enemy team). Thanks!

ItzSwirlz commented 2 months ago

@Larhzu Just a protip: You can merge the commit and then add Co-authored-by which will show more than one person committed something on GitHub. So if a commit is made, say 'Merge branch "foo" from "source"', edit it to say in the last line:

Co-authored-by: Lasse Collin <emailaddress>

(link to GitHub documentation)

And I may be wrong, but I think if you add your GPG key to GitHub, then make a signed commit with -S, even with the mirror as long as GitHub recognizes your commit is signed by your key it will show it as Verified. (github documentation on 'verified' commits)

I hope this is helpful :)

Larhzu commented 2 months ago

Co-authors

Linux uses Co-developed-by:. It sounds similar to Co-authored-by:, not sure if exactly identical though.

If I edit a commit, I often feel I should mention it because it's possible that I made an error and it would be wrong to silently put such errors to someone else's name. Using co-something-by in this case feels slightly odd if the main work was almost solely by the other person. It's just that it's good to have something in the commit message to indicate that others shouldn't so easily blame the commit author for errors in the commit.

It has also been suggested that one could commit the original thing first and then make a new commit to fix it or clean it up. Perhaps it's OK at least if the original commit isn't truly broken. In some cases this makes commit log harder to read though.

So the problem, if there is any, is how to be polite to the contributors so that they feel they were welcome.

Years ago I used to take patches, edit them, thank the author in the commit message (including if it was a patch etc.), and commit it in my name. In GitHub times it seems that to many it's very valuable to get their name as an author in the commit log as that will then show up on GitHub stats.

With tiny things it often would be faster for everyone to just create an issue (or send an email) and let maintainer(s) make the change but then the reporter wouldn't get their name in the commit log in the preferred way. Thus nowadays I try to commit things so that contributors get their name as a commit author and not merely mentioned in the commit message and in the file THANKS.

Signed commits and tags

I recently added my key to GitHub as instructed by someone on IRC. This made the tags signed by me appear as Verified. To my understanding, it would work the same with signed commits.

Mirroring the repository to multiple places is unrelated to the signatures. Storing signatures is a feature of Git.

eli-schwartz commented 2 months ago

Linux uses Co-developed-by:. It sounds similar to Co-authored-by:, not sure if exactly identical though.

I thought they used co-authored-by. The git project does: https://git-scm.com/docs/SubmittingPatches

The kernel project seems to use rather a lot of both styles. Github AFAIK only respects the git project's trailer, not the kernel project's trailer.

Larhzu commented 2 months ago

Co-developed-by is in the Linux docs. In recent years it seems to be 50 times more common than Co-authored-by. Perhaps Linux usage is old and thus a bit specific to that project.

Github AFAIK only respects the git project's trailer, not the kernel project's trailer.

That's how I understood it as well.

Larhzu commented 2 months ago

Questions

Tags signed by Jia

What to do with the v5.2.x and v5.4.x tags that are signed by Jia?

  1. Replace them with my signature on the same commits?

  2. Add a new commit (possibly even an empty one) to the v5.2 and v5.4 branches and sign that to indicate that I'm fine with the content?

  3. Do nothing?

What to do with the v5.6.x tags? Without the trigger code the backdoor files are harmless. However, I think it would be odd for me to sign those tags still. So I currently think I should leave them as is. I don't see a reason to delete the v5.6 branch.

Tarballs by Jia

Would a re-release of 5.2.12 and 5.4.6 be useful? Those likely are the last versions of the v5.2 and v5.4 branches.

New tarballs would be generated with newer Autotools and be signed by me. The filename would need to differ to avoid confusion. For example, xz-5.4.6-rerelease.tar.gz.

Should release files created by Jia be removed from official locations? Some of those I reviewed with diff -Narup against my release tarball candidates back in the day (it was part of the process to teach him to do the packages well). I currently think it's unlikely that those contain anything bad. However, if those are kept available then the public will (rightfully) think that I have approved them even if I never signed them. Thus I would need to review the old packages to some extent in the near future.

Miscellaneous

This thread has wishes and suggestions around these questions already but not much explanation why those wishes are being made. Thus, I would like to hear a sentence or two why one choice is preferred over another.

To those without a GitHub account, replies via IRC or email are fine too.

Neustradamus commented 2 months ago

Dear @Larhzu,

Several people would like that you do new releases (to remove the username of the backdoor author) and good asset files:

And maybe good to remove the 5.6.x tags too?

Note: Same in primary git source place.

Thanks in advance.

Larhzu commented 2 months ago

@Neustradamus:

There are good reasons to make new releases. Just to remove a name isn't one though.

You didn't provide even a single reason why tags should be removed. My message in this thread specifically asked for reasoning about tags and releases.

I'm more than aware that people want progress. Your messages don't help. Instead, they waste my and others' time when we are busy already. Thus, each post you make is delaying the new releases that you are requesting.

Neustradamus commented 2 months ago

@Larhzu: Thanks for your answer :)

To remove and recreate release tags (and add good files), it is to have not bad point with the backdoor author, it is better to have from you but of course it is not only a good reason.

About tags, it is to do not permit to download 5.6.0 and 5.6.1 (and 5.5.x too) for security:

thesamesam commented 2 months ago

We believe at this point that tags are fine / inert by themselves and we wouldn't want to destroy them, but we may rename/retag them with Lasse's signature, as discussed above.

Disservin commented 2 months ago

Linux uses Co-developed-by:. It sounds similar to Co-authored-by:, not sure if exactly identical though.

If I edit a commit, I often feel I should mention it because it's possible that I made an error and it would be wrong to silently put such errors to someone else's name. Using co-something-by in this case feels slightly odd if the main work was almost solely by the other person. It's just that it's good to have something in the commit message to indicate that others shouldn't so easily blame the commit author for errors in the commit.

It has also been suggested that one could commit the original thing first and then make a new commit to fix it or clean it up. Perhaps it's OK at least if the original commit isn't truly broken. In some cases this makes commit log harder to read though.

So the problem, if there is any, is how to be polite to the contributors so that they feel they were welcome.

Years ago I used to take patches, edit them, thank the author in the commit message (including if it was a patch etc.), and commit it in my name. In GitHub times it seems that to many it's very valuable to get their name as an author in the commit log as that will then show up on GitHub stats.

With tiny things it often would be faster for everyone to just create an issue (or send an email) and let maintainer(s) make the change but then the reporter wouldn't get their name in the commit log in the preferred way. Thus nowadays I try to commit things so that contributors get their name as a commit author and not merely mentioned in the commit message and in the file THANKS.

For some context we (I am one of the maintainers) Stockfish, a chess engine, have been merging patches now almost completely locally because there are additional things we need to take care of while merging, we often manually "reword" the commit description and add a "closes" section with a link to the Github PR. Since most of our contributors don't write the description in the commit message but instead in the Github PR body. Somtimes patches also have to be reformatted but these changes are made within in the same commit in our case.

We also require a linear commit history and force pushes are generally disabled, when merging a patch a locally the author name is in the git commit log visable as the author and we (the maintainers) as the committers. Contributors generally also appear in the "Contributors" on Github. Their Github profile shows

Created 1 commit in 1 repository

The only minor downside (from a contributors perspective) is that GitHub shows the PR as closed and not merged :/

so afterall our contributors are generally fine with how things are handled and if you prefer the local way to do things there is nothing wrong with that

mathstuf commented 2 months ago

The only minor downside is that GitHub shows the PR as closed and not merged :/

There is the "allow maintainers to push" option where you could push your to-be-merged edits to the source branch before merging so that the state tracking is accurate.

Larhzu commented 2 months ago

Thanks @Disservin! Since a major project is doing both edits to commit messages and reformatting the patches, I feel better doing it too, combining it with Co-authored-by: to mark that new errors might have been introduced by me.

I can try to remember to force-push to the PR branch (when it has been allowed) to get the PR marked as merged. But I won't consider that essential. Editing a commit message so that there is a link to the PR discussion seems valuable.

Disservin commented 2 months ago

@Larhzu no problem :D

I can try to remember to force-push to the PR branch (when it has been allowed) to get the PR marked as merged.

To my knowledge there is no there way to get it marked as merged when not merging over github directly. See here for example https://github.com/official-stockfish/Stockfish/pull/5217 which is displayed as closed even though we merged it. We manually (or try to) apply a "to be merged" label to all merged patches.

thesamesam commented 2 months ago

I think you can per https://github.com/tukaani-project/xz/issues/103#issuecomment-2095852398.

Larhzu commented 2 months ago

https://github.com/tukaani-project/xz/pull/108 I merged locally and it shows up as merged on GH. However, it went as fast-forward without force-pushing to submitter's repo first so it's not the typical case. And it shows GitHub as the committer which I likely wish to avoid in the future.

JohnDoe1001 commented 2 months ago

@Larhzu Sorry for the nagging. I guess we should just leave the issue to you guys. Good luck to y'all!

thesamesam commented 1 month ago

Lasse has updated the website, refreshed https://tukaani.org/xz-backdoor/, and published 1.0 of the review notes at https://tukaani.org/xz-backdoor/review.html. The article (reflections, lessons, etc) will be written and published separately and isn't ready.

We aren't done with the response, obviously, but this marks a major milestone in the project's recovery, hence I feel like this is the right time to close this particular bug.

Fixed releases are now out (5.2.13, 5.4.7, 5.6.2). You can compare their contents by making your own distribution tarball using make mydist. Future improvements in minimising the diff there are planned too. Thank you everyone.

Neustradamus commented 1 month ago

@Larhzu: Thanks to have fixed the backdoor! I have recently done announcements on my social networks and on HN, Reddit too.

A new PR in vcpkg is here:

cc: @anarazel.

cybik commented 1 month ago

The evil has been defeated

~Troll comment abusing a css exploit. Please report it to Github.~

~You can use mobile apps to get around the CSS glitch, or use ublock origin to force-block https://github.com/Roblox/t/assets/106361566/b3306f20-57e8-449d-95f7-0ec0597b4e7e in My Filters~