samduy / provenance-analysis

Program Provenance Analysis
1 stars 1 forks source link

[Algorithm-D]: improve quality of programs detection #36

Closed samduy closed 7 years ago

samduy commented 7 years ago

Impacted version: 0.5 Phenomenon: time to finish the whole process is too long (a few hours). One of the reasons is there are many programs to check with Internet search (GitHub).

samduy commented 7 years ago

One of the way to improve this is to filter out the list of detected directories so that only highly potential ones are kept and remove less meaningful ones.

Solution: instead of checking for the existence of ANY of the files: README, HISTORY, CHANGELOG, VERSION, LICENSE, now we are checking for the existence of the file README combined with one of the other files. (In other words, there must be at least 2 files and one of them must be the README for the directory to be qualified as a open source directory).

(NOTE: README or README.md or any README* file are also considered the same)