nhs-r-community / statements-on-tools

The NHS-R Community statements on the use of open source tools including (but not exclusively) R and R Studio.
https://tools.nhsrcommunity.com/
Creative Commons Zero v1.0 Universal
14 stars 9 forks source link

Data in packages #2

Closed Lextuga007 closed 2 years ago

Lextuga007 commented 2 years ago

I was writing out the difference between R, R Studio and R packages:

R is a programming language.
R Studio is an integrated development environment which supports R and other languages.
[R packages](https://r-pkgs.org/intro.html) bundles together R code, data, documentation and tests.

And realised that the definition from R packages is that it includes data - that needs to be addressed around concerns for IG/Security. The only data in other people's packages will be (or should be) publicly available but then the issue of our own data should only be around our creating packages.

This probably needs to be spelled out for clarity.

bclarke-nes commented 2 years ago

That's an important catch, which we should be clear about. I'll do some clarification. How about something like...

  1. add the r-pkgs definition, admitting that packages can including data
  2. note that the included data in community packages is sample/open data - like mtcars or whatever - and that e.g. CRAN community standards stop sensitive data getting into packages
  3. so there's no IG problem regarding the use of packages in NHS applications - any entailed data they might contain is generally available and not sensitive
  4. but there might be IG problems when writing packages. We won't provide detailed guidance here on what's okay to include in your package - it's out of scope for this statement, although might be a nice follow-up. But would be useful to note that package authors should be very clear (perhaps inc. IG oversight) that any data they include in their package isn't sensitive.
ChrisBeeley commented 2 years ago

Closed by #11