ossf / malicious-packages

A repository of reports of malicious packages identified in Open Source package repositories, consumable via the Open Source Vulnerability (OSV) format.
Apache License 2.0
254 stars 23 forks source link

Corrections Needed for Several Malware Attributions #660

Open behnazh-w opened 1 month ago

behnazh-w commented 1 month ago

As part of the Macaron package, we have identified several malicious Python packages in your records that have been incorrectly attributed to ReversingLabs as the FINDER. Two examples are the manyhttps and multiconnection packages. We are happy to provide confirmation emails from the PyPI security team for our reports. How can we share this information to update your records?

rhalar commented 1 month ago

Hi! I work for ReversingLabs and have been responsible for our OSSF integration.

So, to clarify our process; we track multiple sources for malware activity on a number of repositories, and we also do our own internal research where we try to catch malicious packages and classify, and report them, as soon as possible. We own a large database of malicious packages with supporting metadata, but we currently limit our output on the OSSF to packages we think we independently found and reported, based on all available information we are aware of (alongside some other criteria, but I don't think that matters here).

It's quite possible that something we claim to have found was also found by you and reported earlier, but we have no way of knowing since no public information is available (that we are aware of that is; please direct us to a source if it's available!), and some repository maintainers weren't quite open in sharing security info with us, so we work with what we have. Alternatively, our tracking might be buggy, which also isn't out of the question :) But, anything we report as found by us has the requirement that it was independently found by our researchers, any kind of missing attribution is completely unintentional!

Some repositories also have a limit on reports strangely enough, I think NPM is an example. So sometimes we do find malicious packages a lot earlier but aren't able to report them until they allow us to. We try to at least check that the package was not already removed by the time we find it.

Anyhow, we'd love to correct any misattribution we might have done ourselves, but to do so automatically we have to enter a record with a reference of a reporter (along with the report time) to our database. Is there a way you could, and would be willing to, provide something of the sort?

Alternatively you can also open a PR and fix it yourselves, the automated ingestion shouldn't override the edits and the OSSF can do the validation it that case. However, I don't think there is a limit to credit entries with the FINDER type so it's possible that both attributions could stay, unless you insist we remove ours. I'm not sure if the OSSF has a policy on this?

We apologize for any apparent slights, hope we can resolve it quickly!

behnazh-w commented 4 weeks ago

Thank you, @rhalar, for explaining your process, it’s very helpful! I understand that multiple people can identify the same malware. We take credit only when we receive a confirmation reply from the PyPI security team after reporting an issue. In the past, we haven't claimed credit for packages we discovered that were removed before we could report them to PyPI. May I know if you take these confirmation emails into account?

Alternatively you can also open a PR and fix it yourselves, the automated ingestion shouldn't override the edits and the OSSF can do the validation it that case. However, I don't think there is a limit to credit entries with the FINDER type so it's possible that both attributions could stay, unless you insist we remove ours. I'm not sure if the OSSF has a policy on this?

I can try creating the PRs myself and discuss OSSF's guidelines. Would you be able to update your database based on that?

rhalar commented 3 weeks ago

I can try creating the PRs myself and discuss OSSF's guidelines. Would you be able to update your database based on that?

OSSF entries are additive, so changes you make will not be overridden, you don't have to worry about our internal database at that point. But we will back-ingest your changes, yes. :) My proposal was more to the point if you'd like to avoid manually adding your own contributions and let our automatic process do it for you, especially if you think that we ought to not share the finder label. Note that this would work only for these cases, we might clash again in the future. I would recommend the PR route if possible though.

In the past, we haven't claimed credit for packages we discovered that were removed before we could report them to PyPI. May I know if you take these confirmation emails into account?

Concerning removed packages, we try to do the same, yes. Though in some rare cases PyPI is spotty on providing information on when something was removed exactly. We have a few strategies for this, and may withdraw ourselves, or add additional finders when we recognize such cases. This will hopefully be rare though. Confirmation e-mails we haven't taken in account. Please do correct me if you think I'm wrong, but I believe those may get sent out to all reporters before the package is removed (or even after?), so we can't be sure even if we receive one. I'll double check this with the team.

We keep an internal record of when we found the malicious package, though we don't expose it here (@calebbrown is that something you would be interested in storing, and where?), and based on that we try to infer if we are an independent (potentially first) finder, adjusting for already removed packages or existing reports before the fact.

The best solution on your part though, and I encourage Oracle to do so if at all possible, is if you are willing to contribute to the OSSF malicious packages repository directly, so that your findings are publicly recognized and enumerated, but I expect this depends on company policy.