samduy / provenance-analysis

Program Provenance Analysis
1 stars 1 forks source link

[Algorithm-D]: Improve accuracy of package detection #25

Closed samduy closed 7 years ago

samduy commented 7 years ago

Current algorithm for package directory detection is not really good. It misses the packages that installed in the system:

Because, current algorithm is based on the Modification date only.

e.g.

/path/to/directory-A/package-B
/path/to/directory-A/package-C

If both package-B and package-C were installed in the same day, it will mis-recognizes directory-A as a package (which is not actually) instead of B or C.

samduy commented 7 years ago

This problem is solved by using another method to identify package directories: based on the default files of GitHub open-source projects:

Of course, this method has a limitation that not every open source project is from GitHub and not all of them have some of the above default files. But, most of them do, so current result is considerably acceptable. (See the latest result)