Allow unknown versions - Githubissues

rviscomi commented 7 years ago

In https://github.com/HTTPArchive/httparchive/issues/77 we're exploring the possibility of detecting third party JS libraries during the HTTP Archive crawl. This project seems like a great fit.

One additional feature we're looking to have is the ability to know when a library is detected but its version is unknown. Is this feasible to add or is there a practical reason not to do it? I can imagine that it may lead to more false positives.

wumpus commented 7 years ago

Is there a significant web community that deliberately removes version numbers from libraries? This is a thing for lots of web framework communities, for example if I type [hide drupal] into Google search, the top autocomplete is [hide drupal version] ...

rviscomi commented 7 years ago

@wumpus my guess is that it's a combination of some developers doing it manually for security/obscurity purposes and some libraries, or older versions thereof, just not having an API to get that info out. Detecting it from HTTP Archive would give us a sense of the magnitude.

johnmichel commented 7 years ago

I think that this is a great feature to support. Do you have any examples of this in the wild so that I might be able to formulate an approach for how this can be done as efficiently as possible?

I understand the rationale for hiding versions for security reasons, but on the web (client-side, at least), everything is pretty much in the open if you're willing to dig deep enough (determine version through available API methods or something similar).

rviscomi commented 7 years ago

Taking jQuery as an example, you could modify the return condition to allow falsey versions:

var jq = win.jQuery || win.$ || win.$jq || win.$j;
if(jq && jq.fn) { // removed check for jq.fn.jquery here
    // the jq.fn object definitely exists, but jq.fn.jquery may be undefined
    return { version: jq.fn.jquery };
}
return false;

FWIW I don't have any real world examples of sites that exhibit this behavior.

igrigorik commented 7 years ago

Can we just make a sample one that hides the version? :)

tlauinger commented 7 years ago

About versions (not) exported by libraries: The essential points have already been mentioned above, but I can add a few details from our research project. We downloaded all available library versions from a number of JS CDNs to check if the detection worked as expected. There were many libraries that didn't initially export the version, but added support later. There were cases where support was (accidentally?) dropped for a release or two and reintroduced again later. I believe we even saw a case where the library developers forgot to increase the version number in their code, resulting in a mismatch between the version returned by the code and what the CDN thinks it is. Overall, lack of API support is probably the most common reason for missing version data. Considering only detections with a valid version number greatly reduces false positive detections, but it often excludes older library versions that didn't have the API support.

What website developers do with the library code is another thing. If they minify code with aggressive (non-default) settings, dead code removal could result in the version attribute disappearing. Or they could manually hide/obfuscate the version. I think we saw a couple of version strings that included the name of the website.

tlauinger commented 7 years ago

BTW, detection of "unknown" versions by Library Detector isn't handled consistently at the moment. Some tests return 'unknown' (e.g., Ink), some return 'N/A' (e.g., FastClick), others return '' (e.g., Closure), and the newer tests (including the ones I added) return null.

Furthermore, as @rviscomi mentioned, some tests (such as jQuery) have the version attribute in the if clause so they don't detect anything if the attribute is missing, whereas the newer tests do the more general return library.version || null thing.

rviscomi commented 7 years ago

Can we just make a sample one that hides the version? :)

Used a codepen sample in https://github.com/HTTPArchive/httparchive/pull/80

detection of "unknown" versions by Library Detector isn't handled consistently at the moment

Perhaps an UNKNOWN_VERSION = null const should be standardized.

johnmichel commented 7 years ago

BTW, detection of "unknown" versions by Library Detector isn't handled consistently at the moment. Some tests return 'unknown' (e.g., Ink), some return 'N/A' (e.g., FastClick), others return '' (e.g., Closure), and the newer tests (including the ones I added) return null.

This is an area I think that could certainly be made more consistent.

Perhaps an UNKNOWN_VERSION = null const should be standardized.

That sounds like a good standard to stick with.

johnmichel / Library-Detector-for-Chrome

Allow unknown versions #92