Closed carols10cents closed 9 months ago
There were a few good ideas on the recent rust subreddit post. Maybe they could be evaluated?
It would be awesome if you could list the ideas here with pros/cons of each!
Sure, I will do my best to summarize:
foubar
as too similar to foobar
but foobar_plugin
would be fine.My preference would be to start with the lightest weight solution here, which would be the first one you noted, which is very similar to the description of this issue. Before changing policies or putting up barriers, I would like to be notified about what is happening, how often, by whom, and have time to adjust e.g. the edit distance before taking more drastic measures.
Hey, I would be interested in implementing this.
I think that we would need a list of popular crates first (possibly, like, the 50 most downloaded crates).
Having such a list, would make it possible to check whether there are already crates that might be typo-squats.
An actual implementation of just silently flagging the crate / sending the email upon creation shouldn't be hard to do in the end.
Hi esclear,
@carols10cents is looking at getting me a snapshot of the db in order to look into this. I'd be happy to work with you on it.
Sure :+1:
Okay, I'm currently working on doing some data analysis.
The 50 most popular crates (considering all-time downloads) so far are:
I shall provide a list of other crates with more or less similar names to these tomorrow.
Okay, I accidentally did it right now. Seems like levenshtein distances above 3 produce results, that don't help at all deciding whether crates have similar names.
Turns out, that for the idna
, toml
, itoa
, term
, mime
and dtoa
-crates (plus some more) the levenshtein-approach could be a little problematic, if we use distances >= 3
as an indication of possible typo-squatting, because there will be many false positives.
Using a levenshtein distance lower than 3 as an indicator of possible typo-squatting would yield the following result:
Thus, I would suggest treating names as possible typo-squads if:
Because of #159 I've been hesitant to look at this via crates.io search.
I agree that we will have to adjust distances based on word length though.
This wasn't done via the search. I got a list of all package names from the crates.io-index and the 50 most popular crates along with downloads from the API.
After discussing typo-squatting with some friends, In my opinion it would be sufficient to flag any crates which name is similar to a popular crates name within a levenshtein distance of 1. In the end it is unlikely to have two typos and have that match a typo-squat-crate, because for a higher distance it is more unlikely to hit such a crate and any adversary would have to submit many more crates to increase the chances of a developer actually using the typo-squat crate.
I to have some prior work on this and would love to be invalved in moving this forwerd!
I was starting to research adding a typo check to cargo-edit. It would be convenient if there was a API for getting the possible typos from crates.io. It would also be nice if they appeared prominently in the search results. For a good, but non malicious, example I think request
should suggest reqwest
.
Perhaps a link from each crates page Not what you are looking for? Try crates with similar names.
? Then a page sorting crates from newest to oldest with a link to similar names
and the suggestion to e-mail help@crates.io if you see something suspicious?
Resurrecting this thread in light of recent events. I have a proposed solution that is a bit of a mix of @TheDan64 's points number one and two.
Proposed Solution:
Whenever a new crate is published on crates.io, check whether another similarly named crate already exists, using Levenshtein distance as mentioned above. If it does, perform a basic code comparison, and if the code is substantially similar:
cargo add
with a "Did you mean ___?` message.The parameters of the Levenshtein distance used could be tuned as needed to help optimize the number of code comparisons performed. Also, the relative popularity of a crate may need to be taken into consideration, both in terms of risk and in terms of prioritization for the Rust Security Response WG.
I was originally thinking this should just be a cargo
feature, but I think this would be better handled centrally on crates.io with the cargo add
behavior simply utilizing it.
Links:
I like the proposal except that I worry that the warning won't be seen by most people since it depends on the use of the non-built-in cargo add
. I always edit Cargo.toml
directly to add dependencies and as far as I'm aware all my colleagues do as well. With a warning on cargo add
there would also be no warning for transitive dependencies on typosquatted crates. I filed https://github.com/EmbarkStudios/cargo-deny/issues/421 which might help for users of cargo-deny.
Ideally a warning could be printed by something within cargo
itself, rather than a third-party plugin (cargo-edit / cargo-deny), but that's a bit tricky since if you don't run cargo update
and just directly cargo build
after editing Cargo.toml
, the malicious code could already be running by the time you see the warning.
Also note that if the manual review approach is taken, it would be necessary to review each version, otherwise a simple avoidance of the protection is to upload the initial release of a typosquatted crate with a small bugfix (so it looks like you just needed to publish a fork with the fix) and then once it passes security review, publish an update with the malicious code.
Yeah, I very much agree that it will be difficult to help cover all workflows.
Though cargo add
is now a mainline feature of cargo
itself: https://github.com/rust-lang/cargo/pull/10472
And that is another good point, there may need to be some form of perpetual/on-going checks.
We should also compare notes with other community that have tried it in the past or have it now. https://docs.google.com/spreadsheets/d/12QlaYEtcp2ZwZRfZPHR4D3YpY8k770hYBeFQ6-N7Mts/edit#gid=1022416269 row 10, suggests that:
That document was collected by OpenSSF Working Group on Securing Software Repositories, so when we have a proposal we can ask for peoples input there.
we have integrated https://github.com/rustfoundation/typomania last year and are expanding its integration in the near future. I guess this means the original issue is resolved :)
Edit distance of some small amount away from an existing crate, when detected send an email to help@crates.io with a link to the crate and a link to the crate that its name is close to?