Open ericwb opened 5 years ago
This integration could also add some differentiation as compared to similar competing Apps
Note; detect-secrets overlaps with language specific linters somewhat. It's KeywordDetector plugin will detect variables named password or secrets, same as Bandit.
And many of the other plugins it has could be integrated into language specific linters, since they would be better at detecting using an AST rather than regex.
However, non-language files (.py, .go, .js, .ts) would be best detected by detect-secrets. And there are examples of people putting secret keys in json files, yaml files, etc.
So should detect-secrets only be used on files not scanned by other linters?
I like the idea of detect-secrets
only scanning files not scanned by other linters, but I don't think it's as cut and dry as not scanning files scanned by other linters:
1) linters may not support secret detection (i.e. I don't think tslint does)
2) a linters secret detection may not be as complete as detect-secret
's , for example I don't think many language security linters do high entropy string detection?
There's a few approaches we could take here:
There are a few particularities in detect-secrets:
it doesn't provide "severity" and "confidence" level. We should discuss what approach we should use
it doesn't have good documentation for every plugin it uses. The only documentation I could find is this: https://github.com/Yelp/detect-secrets#currently-supported-plugins
it doesn't provide detail information for every issue like the other linters. That means that because the "message" attribute is mandatory in the annotation object https://developer.github.com/v3/checks/runs/#annotations-object we should figure out what detail message we would provide for the different issues it finds.
We should discuss what to do with those.
I think that we can try and see how an option 2
have some logic to prevent reporting the same/similar issues for the same line of code.
implemented with the logic from subpoint c
c. filtering out duplicate reports from lower priority scanners based on the functionality map and priorities
Here is the list of the pull requests/features we should add to fully integrate detect-secrets into Precaution:
Currently, detect-secrets
doesn't give you the option to use it with multiple parameters like that:
detect-secrets scan test.js test.yaml
https://github.com/Yelp/detect-secrets/issues/180
That's why to achieve scanning multiple parameters we have to call it like that:
detect-secrets scan .
The problem is that currently, we are using two folders for our temporary files which will be processed
cache/go/src/<repo_id>/<pr_id>/
for all Go files and cache/<repo_id>/<pr_id>/
for all other files.
If you want to read why we use two folders read the discussions here:
What happens because of that is that when we start detect-secrets
from the cache/go/src/<repo_id>/<pr_id>/
directory it will scan all go files with ease but it will miss all other files in the cache/<repo_id>/<pr_id>/
folder.
If we start detect-secrets
from cache/<repo_id>/<pr_id>/
folder we would scan all Go files below and if there are not deleted files from an old scan we would report for their errors too.
Because of all those problems, I believe that first, we should create a Docker image which will allow us to use Go modules in Gosec (as described here: https://github.com/vmware/precaution/issues/227) and then we would continue with the detect-secret
spawn functionality.
I made a research for other tools besides detect-secrets
and here are my findings:
Tools using the Shannon entropy for minimizing the false-positives:
There are a lot of tools focused on finding secrets which are working on Git repositories.
Those tools won't work for us because we are scanning files under the cache/
folder.
But cache/
is in our gitignore
file so those files will be missed.
How about scanning one at a time?
How about scanning one at a time?
detect-secrets
doesn't support multiple parameters right now: https://github.com/Yelp/detect-secrets/issues/180 which means that we should give one parameter - a folder to detect-secrets
which contains all files to be scanned
As I mentioned in this comment https://github.com/vmware/precaution/issues/209#issuecomment-494351377 we have a problem with that which will be resolved if we have Dockerfile.
We reevaluated the benefits of using detect-secrets
HighEntropyString plugin.
We found out that this plugin won't work for us because we used the default entropy configuration and then detect-secrets
didn't catch an obvious hardcoded credentials.
@joshuagl decided to configure it a little bit and he got a lot of false positives.
It's meaningless to use detect-secrets
without the HighEntropyString and that's why we decided that we would write our own implementation using regular expressions to scan for specific tokens.
Because that would be a nontrivial feature to implement we decided to move this feature to the Near future
milestone.
It's important when implementing the regex scan how we would store the files for scan in the memory because Heroku provides us with limited RAM https://devcenter.heroku.com/articles/limits#dyno-memory. One of the ways to process the files will be by fixing the maximum amount of memory we want our files to use and then before opening a file:
currentUsedMemory
+ the size of the file > maximumRAMusage
then we have to use 4096 bytes (or another size) of blocks to process that filemaximumRAMusage
memory then load the whole file in the memoryPS: Closed the issue by mistake
Is your feature request related to a problem? Please describe. See https://github.com/Yelp/detect-secrets for reference, but it would be useful for Precaution to also warn on secrets, Confidential code headers, etc. So level of detection is done through GitHub token scanning, but not all.
Note that also detect-secrets operates on all files (json, txt, etc), and not just languages files. There are definitely cases where API keys are stuffed into shell scripts, json files, yaml, etc.
Describe the solution you'd like A new, slightly different type of scanner. Not really a linter. And it would scan all types of file extensions, not just language code, so the download of files would need to include everything.
Describe alternatives you've considered Token scanning provided by GitHub does some things by not all. I don't think it would find a Confidential header whereas detect-secrets could be configured to do so. But it would be good to research and do a comparison analysis.
There are also probably many other CLI tools to do similar scans. This was one recommended from a blog article.
Additional context