Open KOLANICH opened 3 years ago
Hm, git
could be useful here. You could compare the extracted library with the original at the chosen point in time (probably the version release tag). Since git
does not care what language you are comparing, you could see the exact differences. Anyone know of any pitfalls that would prevent an implementation like this?
git is useful here to check out versions only. The problem is detecting the modifications to a known open-source lib without having their source and without any clue to the exact version that was used as a base. I.e. someone used a foss lib, embedded it into own software, but did some modifications, so when used with the upstream version, the software fails. Then he used a optimizer, so the decompiled source doesn't strictly match the original one. Most the original symbols names are lost. CGF is a bit distorted too - optimizer decided that this way it would be a bit faster. Some functions are inlined.
Project description
There are lot of libraries and there are lot of software using them. Sometimes known open-source libs used in software are minorly customized. We wanna know which libraries were used, which versions of them and what pieces of the code were changed.
The usual approach for this is extracting some features from code (control flow graph, signatures) and matching them against the database.
There are some free open-source solutions for that. Unfortuantely almost all of them are for Java for Android since this platform is highly affected by bunling libs and uses bytecode that eases feature extraction. We need to abstract the existing solutions enough to allow them be easily adapted to any programming language (i.e. python, javascript, C++ (using retdec as a decompiler) , C# (using any .net decompiler)) for which we can extract AST.
Relevant Technology
Complexity and required time
Complexity
Required time (ETA)
Categories