Define spec for embedding metadata strings into object files

dwvisser commented 9 years ago

There are ways that compiled languages such as C and C++ can be coerced to embed metadata strings into object files that don't get optimized away by the compiler. I would like to explore defining a standard way of embedding strings for, e.g.,

CPE or other unique name
License Info
Version

Ideally, we could convince at least one widely-depended-upon project with known CVEs to adopt this, and write an analyzer that successfully uses this data to characterize object files. Then, embedding this metadata might be presented to the software engineering community as a valuable practice to adopt.

jeremylong commented 9 years ago

While an interesting idea - I've had similar thoughts with regards to trying to get more standardized labeling within the manifest of JAR files.

People can already register their own CPE names (see this page for info); however, this is still uncommon. Most CPE entries exist because a vulnerability was disclosed and a CVE entry was created.

However, some interesting work in this area is being performed by TagVault; specifically with regards to Software ID (SWID) Tags. I think the effort around SWID is more promising.

dwvisser commented 9 years ago

I was actually unaware of TagVault and SWID tags before now. I agree that, when present, DependencyCheck should analyze them. As you can see, I've created an issue tracker to that end, and hope to devote effort towards it in the near future.

I do think, however, that a standard for embedding metadata into object files, if executed correctly, could lead to greater ability in the open source world to reliably perform useful tagging. Remember, it's not always (quite often, in fact) the official project that builds the binaries and packages on the end user's system.

E.g., say I decide to customize the (fictional) GPL-licensed AcmeWidgetLibrary 1.3.7, because it mostly does what I want, but writing a façade/wrapper is impractical for my purpose. DependencyCheck could still see that I essentially have a dependency on AWL 1.3.7, and give me a useful warning if a CVE shows up for it. The same argument applies, in principle, to the packaged forms of open source libraries on, e.g., Debian and Fedora Linux distributions.

jeremylong commented 9 years ago

I'm still leery of a standard that would require developers to add more data to their project outputs that has no affect on the projects functionality - it won't get used.

One of the design goals of dependency-check was to make it flexible to identify libraries even if it was not the official build. While developers do download, modify, recompile, etc. it is rare in my experience that all meta data would get stripped. The exception to that would be taking a bit of functionality and refactoring it into the app (see dependency-check-utils org.owasp.dependencheck.org.apache.tools.ant as an example. In these cases other tools are better at identifying these "slices" of functionality because a more holistic matching framework is built on hashes of class files, the contents of the source files, etc. Some of the commercial tools, specifically the ones that started in the legal/compliance side of FOSS are very good at identifying little bits of code from other projects embedded in an application.

The problem with building a database of facts (hashes, source file info, etc.) is the maintenance of the data. Dependency-check was started without the intention of maintaining a database of information because being one unpaid engineer - there is no way I was going to try to maintain a database.

That being said, as the community around dependency-check grows it might be possible to start building a more robust database of "facts" to do deeper identification of dependent code.

jmanico commented 9 years ago

Since Dependency Check depends on CVE so much, it is possible to make a relationship with MITRE and do....

CVE -> Feeds -> Dependency Check Dependency check errors -> Feeds back into -> CVE

?

Jim

On 4/7/15 8:28 PM, Jeremy Long wrote:

I'm still leery of a standard that would require developers to add more data to their project outputs that has no affect on the projects functionality - it won't get used.

One of the design goals of dependency-check was to make it flexible to identify libraries even if it was not the official build. While developers do download, modify, recompile, etc. it is rare in my experience that all meta data would get stripped. The exception to that would be taking a bit of functionality and refactoring it into the app (see dependency-check-utils org.owasp.dependencheck.org.apache.tools.ant https://github.com/jeremylong/DependencyCheck/tree/master/dependency-check-utils/src/main/java/org/owasp/dependencycheck/org/apache/tools/ant as an example. In these cases other tools are better at identifying these "slices" of functionality because a more holistic matching framework is built on hashes of class files, the contents of the source files, etc. Some of the commercial tools, specifically the ones that started in the legal/compliance side of FOSS are very good at identifying little bits of code from other projects embedded in an application.

The problem with building a database of facts (hashes, source file info, etc.) is the maintenance of the data. Dependency-check was started without the intention of maintaining a database of information because being one unpaid engineer - there is no way I was going to try to maintain a database.

That being said, as the community around dependency-check grows it might be possible to start building a more robust database of "facts" to do deeper identification of dependent code.

— Reply to this email directly or view it on GitHub https://github.com/jeremylong/DependencyCheck/issues/212#issuecomment-90774343.

hansjoachim commented 9 years ago

DependencyCheck could still see that I essentially have a dependency on AWL 1.3.7, and give me a useful warning if a CVE shows up for it. The same argument applies, in principle, to the packaged forms of open source libraries on, e.g., Debian and Fedora Linux distributions.

Most (all?) distributions have existing systems which gives an overview of open security issues for their packages, see for instance Debian's [1] and Ubuntu's [2]. When drilling down to a particular issue they often contain cross-references to bug trackers for other distributions/upstream/concerned parties. I don't know if this data originally comes from the CVE entry or not. Might be worth looking some of these existing systems if trying to look at packages.

Because for distribution packages and (probably more so) embedded versions of libraries it might not be sufficient to check version number. Most distributions wish to limit the amount of changes to the archives for their supported releases, so while new releases are added to the development release, existing ones get backports for serious issues. For instance, around the time of Heartbleed, Ubuntu got a couple of bug reports saying "you still have vulnerable version X, you should upgrade to Y". However, they had responded to it, so the version number wasn't X, it was actually Xpatched1 (paraphrasing) which already included a backported fix for the issue. So going by version number alone might yield false positives due to how security fixes are applied.

[1] https://security-tracker.debian.org/tracker/ [2] http://people.canonical.com/~ubuntu-security/cve/

dwvisser commented 9 years ago

@jmanico What kinds of Dependency Check errors were you thinking of feeding back to the CVEs?

dwvisser commented 9 years ago

@jeremylong said:

I'm still leery of a standard that would require developers to add more data to their project outputs that has no affect on the projects functionality - it won't get used…

Here's an example of what such an embedding might look like at the source code level:

const char my_id[] = "MY_ID_MAGIC_HEADER;program=demo;version=1.0.0;MY_ID_MAGIC_TRAILER";

As you can see, it's not too onerous (though it might also require a compiler flag to avoid being optimized away). If only one prominent security-related library, e.g., OpenSSL, were to adopt said embedding that Dependency Check could then detect, it would be a win.

While developers do download, modify, recompile, etc. it is rare in my experience that all meta data would get stripped.

C/C++ libraries, particularly in Linux-land, are delivered compiled, linked, stripped, and statically linked, leaving no separate metadata files to examine. Hence, the idea here, to come up with some agreed-upon simple way to embed a short bit of useful metadata in the object files.

hoggmania commented 9 years ago

For a producer to add the meta-tag I think is fine idea, especially for native code.

However, a lot of people rely on native 3rd party libraries and have no way to identify them to the Dependency Checking tool (unless someone can point me in the right direction!).

If we could add a scanner to pick up a ’.cpe’ file or ‘.swid’ file (or inspect) that contains the information to construct the cpe for the native library, then this would help greatly in allowing legacy checking

dwvisser commented 9 years ago

@hoggmania There is an existing issue opened with respect to analyzing SWID tags: #214. Feel free to comment there, or add suggestions. I'm not aware of any standard way of embedding a CPE into a software project. I have developed a prototype specification for doing this along the lines described above. (It's basically similar to the shown example, but allowing for a CPE variant, and attempting to take care of corner cases in the syntax.)

jeremylong commented 8 years ago

See the comments on PR #298.

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

jeremylong / DependencyCheck

Define spec for embedding metadata strings into object files #212