test-tc39-org / tc39-bot

ISC License
0 stars 0 forks source link

Security / Privacy Considerations #2

Open IgnoredAmbience opened 6 years ago

IgnoredAmbience commented 6 years ago

The GitHup app model allows for an app to be internal to a single user/organisation, or to be made public to allow a deployed instance of the bot to be "installed" on any user/org's repository. The bot is initially authored with the intent of not being available for public installation, and only being installed onto the tc39/ecma262 repository. However it should be coded such that it does not pose a security/privacy risk in the case that it is installed to other tc39 repositories, or (mis-)configured to be publicly installable in future.

None of the above will preclude the bot's source code from being deployed elsewhere, used with different GitHub Apps credentials, and installed on other repositories.

Configuration Options

The Probot frameworks permits two types of configuration to be stored:

  1. Bot default configuration, hardcoded for all deployments.
  2. Per-deployment configuration, for example the authentication key for access to the GitHub API
  3. Per-repository configuration, stored in .github/something.yml, for example configuration of which PR titles should trigger the CLA check.

At present, the only suitable mechanism for storage of secrets is the 2nd, as Probot is not capable of storing secrets in the 3rd form of configuration, the 1st form is public by the nature of being open source. The second form of configuration is usually achieved through environment variables configured on the deployment platform, so can often be awkward to store complex structured configuration data.

Google Sheets Configuration & Sensitive Data

The signatory form collects personally identifiable information into a Google Sheet. This bot necessarily requires access to the GitHub usernames collected in this sheet, but no further information. The bot must not leak personal information under any circumstances.

The Google Sheets permission model is that an account may be given read or write access to an entire spreadsheet, more granular permissions are not possible.

Secrets, such as the Google API credentials, can only be stored at the per-deployment level. The simplest approach would configure one Google Service account for the entire bot deployment. In this case, configuration for reading from the CLA spreadsheet is sensitive, and the Sheet ID, and lookup cell ranges must not be configurable on a per-repository basis, to avoid leaking personal information.

If multiple CLA forms for different purposes are required in future, multiple sheet ID/cell ranges configurations could be specified in the deployment configuration, and the appropriate configuration selected by the repository configuration.

Organisation Team Memberships

It is assumed that knowledge of a user's membership of the tc39/delegates team is not sensitive information, so disclosure of this is not an issue. I assume this also extends to membership of other tc39 teams, so the ability to configure team membership checks on a per-repository basis is permissible.