sourcegraph / srclib

srclib is a polyglot code analysis library, built for hackability. It consists of language analysis toolchains (currently for Go and Java, with Python, JavaScript, and Ruby in beta) with a common output format, and a CLI tool for running the analysis.
https://srclib.org
Other
942 stars 62 forks source link

Pass last-cached build states to toolchain scanners #101

Open xizhao opened 9 years ago

xizhao commented 9 years ago

So scanners can benefit from incremental build features

If tc's had access to the last-cached build alone, it could diff on its own files during the next scan step and skip or pull data from old build data.

A more sophisticated and involved approach is to pass actual diff information so that scanners can just iterate through a "modified" (changed + new files) array. This can either be backed by VCS diff commands or manual hash collisions in srclib.

samertm commented 9 years ago

It isn't a big deal to pass in the changed files as an argument to the scanner, and I don't think it would be a big deal to ask all scanners to parse "--diff-file ", even if they don't use that information. I think this would be a good change for the scanners that can leverage this information.

Can you explain what you mean by "actual diff information"?

xizhao commented 9 years ago

Bump. To my understanding incremental builds already copies over files from the previous .build-data directory and saves some work during the graph step.

For now should toolchains be expected to find the most recent commit and diff themselves? Should they be stashing their own info in the srclib-cache folder?