As for current requirements and features, the bot doesn't need to store the file sha to uniquely identify a file from GitHub. The bot can fetch a file by using owner, repoName, and filePath.
The bot won't be using the branch also, because as of now, I am thinking of keeping the default branch as single source of truth for all the risk scores because there can be multiple versions of the same file in different branches which can make the training data redundant and dirty.
As for current requirements and features, the bot doesn't need to store the file sha to uniquely identify a file from GitHub. The bot can fetch a file by using owner, repoName, and filePath. The bot won't be using the branch also, because as of now, I am thinking of keeping the default branch as single source of truth for all the risk scores because there can be multiple versions of the same file in different branches which can make the training data redundant and dirty.