ropensci / unconf18

http://unconf18.ropensci.org/
44 stars 4 forks source link

Security/Safety "Best Practices" for rOpenSci Package Developers/Reviewers #35

Open hrbrmstr opened 6 years ago

hrbrmstr commented 6 years ago

We've done a bit of this ad-hoc, but we could spend some dedicated cycles ensuring that rOpenSci not only has the best technical and maintenance standards — which it most certainly does — but is also the de-facto standard to replicate when considering safety/security.

elinw commented 6 years ago

How are you thinking about safety/security? I think this is a great concept.

karthik commented 6 years ago

We discuss this regularly in our staff channels and would be super grateful for your advice/help on this! cc @maelle

maelle commented 6 years ago

We'd like to link to https://ropenscilabs.github.io/r-security-practices/ whenever it's ready. Just sayin' 👼

mmulvahill commented 6 years ago

@hrbrmstr I'm interested in learning more about how to think about security/safety w.r.t. R. That's all I have to add for now 😉

hrbrmstr commented 6 years ago

I somehow missed the comment 20d ago @elinw (apologies). https://github.com/hrbrmstr/rpwnd provides some context for the evil one can do with R and https://ropenscilabs.github.io/r-security-practices/ (which @stephlocke penned and @maelle noted) has a great start for that and other topics.

Packages with embedded other-lang libraries need care & feeding and some way to inform users they are in need of an update. Package authors may be putting vulnerable researchers (some who may not even know they fit that classification) and users in harms way without even knowing it depending on what type of internet calls they make or system traces they leave around.

We also started work last year on a way to help ensure package download safety (https://ropensci.org/blog/2017/07/25/notary/) but all of us who worked on it have been super busy and even if we weren't, it's somewhat moot b/c there's no backing infrastructure for it nor support in R itself for it (which is where it'd need to be).

One thing from the notary work that'd be an interesting "mandate" from rOpenSci is the requirement that all contributors use PGP and sign all commits and no GH merges or releases happen w/o that. Since R has no way for us to have "developer certs" like Apple or Android have for their apps, and since the package ecosystem is more collaborative in nature, the "everybody PGPs" approach at least provides a better guarantee that we can truly trace commits back to the person and not just the GH account.

In the context of ^^ perhaps one "fun" (I have weird ideas of what constitute that) wld be to get everyone on Keybase at the unconf. I 💙💙💙 what @stephlocke is doing with that in her personal and professional R work and perhaps finishing https://github.com/hrbrmstr/keybase wld be a possible unconf project.

noamross commented 6 years ago

In general, we try to make sure that mandates for RO packages go through a process that includes internal use, recommendation, good tooling that reduces effort and good docs/tutorials (not necessarily in that order) before requiring them. So work that advances any of those would make a mandate more likely.

I'd be interested in starting with tooling that could add this check (using git2r?) to both our onboarding checks and our nightly builds, along with other security best practices (goodpractices 😉).

elinw commented 6 years ago

I’ve been on small projects that tried to mandate PGP and it was really painful and hard to enforce.

It does worry me how much trust a lot of packages put into API calls to download data, both assuming the source must be safe and assuming nothing happens in the space between requestor and requested (e.g. a redirect to a site distributing malware). I don’t see a lot of validation or sanitizing.

On May 15, 2018, at 8:57 AM, Noam Ross notifications@github.com wrote:

In general, we try to make sure that mandates for RO packages go through a process that includes internal use, recommendation, good tooling that reduces effort and good docs/tutorials (not necessarily in that order) before requiring them. So work that advances any of those would make a mandate more likely.

I'd be interested in starting with tooling that could add this check (using git2r?) to both our onboarding checks and our nightly builds, along with other security best practices (goodpractices 😉). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/unconf18/issues/35#issuecomment-389156795, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuEfQY8iFu7G90SooS6wmryJqRTCdN1ks5tytDCgaJpZM4TijcH.

hrbrmstr commented 6 years ago

Aye. And there's "guidance" that might be useful to note in some API packages. For instance, I wrote epidata to access the economic policy institute data and use the data from it for various classes. Each call out to that API I do from home is logged (Federal requirement and also a side-$-business) by Comcast and searchable by authorities or interested third-parties. They use that data to classify me as a left-leaning activist (when, in fact, I'm really just a non-affiliated anti-authority anarchist :-) I've seen evidence of that in various mailings, adverts on sites that manage to get through my ad-blocking infrastructure, etc. And, due to a job stint at one of the world's largest network providers, I've also maybe even seen said databases. It's worse in other countries/regions and many at-risk researchers (again, who don't even realize they're 'at-risk') do not realize they shld be using, say, a VPN for some API calls or using DNS-over-HTTPS or DNSCrypt since DNS leaks where you're going.

I'm not suggesting rOpenSci can solve or provide guidance on all the issues, but we (I say "we" despite working in a rly strange proto-science vs a real one like y'all) cld definitely up the safety game for those using R.

hrbrmstr commented 6 years ago

@elinw (re: PGP) aye, is is no panacea and unless you're a die heard infosec geek or have a die hard infosec hobby, being religious about PGP configs and use is a pain, especially when setting up new systems. Keybase definitely helps alot and perhaps we (like @noamross was alluding to) cld develop a "safety/security check" package/function similar to devtools::dr_devtools() or goodpractice as part of this to help both identify gaps and provide helpers or at least friendly tips on fixing things.

noamross commented 6 years ago

If you want to do live testing of a package, like seeing what system files/folders it modifies, I'm working on a Dockerized setup for our standard package tests: https://github.com/noamross/launchboat, so one could run tests in an isolated environment before installing.

boshek commented 6 years ago

Oh this is all so interesting. After reading about notary last year and some linked horror stories I try to sign all my commits now. So thanks @hrbrmstr !

It occurs to me that this is related to this possible project and in fact may be a key component. It is so easy to build packages/scripts and miss significant security considerations (at least for me) that this area likely has many spaces that could be improved upon. Providing means for reviewers to identify and even just consider that as part of a reviewer suite of tools would likely be useful.

hrbrmstr commented 6 years ago

@noamross aye. been keeping an 👀 on launchboat and am also keen to also be watching the network calls pkgs make.

jennybc commented 6 years ago

I'd appreciate knowing what the most realistic threat model is for the R package ecosystem and how that aligns against various measures to tighten things up.

Example: I am dimly aware of malicious packages in some other language's repository that had names very close to the "real" packages. And the Bad People exploited mis-spellings to get users to install and run them. That's a really different threat from, say, someone impersonating me and making commits to packages I maintain.

Which threats should we be most worried about and who has to do something to mitigate it?

batpigandme commented 6 years ago

+1 to all of this… Also, and maybe this is limited audience (or just unrelated), but basic file threat-assessment. Sometimes you've gotta deal with someone else's data, and (e.g. with readxl) they have to get it to you some way…

hrbrmstr commented 6 years ago

That's a 👍 point @batpigandme. "Thankfully?" malicious XML and JSON docs are usually targeting browsers and wld have some serious impediments trying to account for various R interpreter environs. Similarly, malicious PDFs are usually targeting Acrobat or Preview or third-party Windows PDF readers. However, the pkgs in the R ecosystem are all using the same core, [vulnerable] libraries so there is room for caution. And, we all get Word docs, Excel docs, PDFs, etc which all have threat vectors.

hrbrmstr commented 6 years ago

@jennybc that's definitely a good unconf working-group mind-meld/group convo (since I'm likely far from the typical R user and cld use some examples of daily use patterns to help with said threat modeling :-)