nhs-r-community / statements-on-tools

The NHS-R Community statements on the use of open source tools including (but not exclusively) R and R Studio.
https://tools.nhsrcommunity.com/
Creative Commons Zero v1.0 Universal
14 stars 9 forks source link

Setting up git/github safely with .ignore files #10

Open JonMinton opened 2 years ago

JonMinton commented 2 years ago

A key risk associated with using git/github for healthcare settings, involving working with identifiable or potentially identifiable patient level data, would be in not thinking carefully enough about project/repo structures and which locations to .gitignore within a project/repo. For example, if someone filters on a half dozen records from a secure database it's important no commit contains these records, even though they may want to include the code which performs the filter. This suggests it's important to have both a clear understanding about how to .gitignore locations, and a priori agreement about which folders inside a project should contain what kinds of data. Some discussion about data security roles within an active repo might be important to include too, so there's not any kind of 'incident' involving this kind of accidental release of data, which could set back progress on collaborative coding and version control quite quickly and quite fast.

Lextuga007 commented 2 years ago

I wrote a post for my team on using .gitignore if that helps: https://cdu-data-science-team.github.io/team-blog/posts/2022-04-01-using-usethis-to-set-up-gitignore/

wbryant commented 2 years ago

Could we add github actions/pre-commit hooks to this? Someone in my team recently demoed this feature in our analytics template that is great as it looks for things that look like secrets, large (presumably data) files etc, with very little overhead once set up.

Lextuga007 commented 2 years ago

Yes, sounds good. Would the person who demoed this be interested in contributing?