siom79 / japicmp

Comparison of two versions of a jar archive
https://siom79.github.io/japicmp
Apache License 2.0
715 stars 107 forks source link

Japicmp compatibility index for Maven Central #82

Open jvanzyl opened 9 years ago

jvanzyl commented 9 years ago

Are you interested in working with me to try and produce a compatibility index for Maven Central. There now exists an infrastructure to try these experiments:

http://takari.io/2015/10/28/google-maven-central.html

siom79 commented 9 years ago

This is a great idea and will be helpful for every developer upgrading from an old to a new version. I would be happy to participate in such kind of experiments. :smile:

metlos commented 9 years ago

Not trying to parasitize or something but you might also look at https://github.com/revapi/revapi. Admittedly, it is much slower than japicmp, but does more thorough analysis as far as I looked (method overloads matching, things like "exception class is now checked exception", computation of API "envelope" by recursive checking for usages of types from deps, distinguishes between "method removed" and "overridden method removed", etc).

siom79 commented 9 years ago

@metlos: japicmp compares all points listed in the "Binary Compatibility" chapter of the JLS. As no overload resolution is done at runtime in Java, adding new overloaded methods does not break binary compatiblity (JLS 13.4.23.). And removing an overriden method is also not a binary incompatible change as you write yourself in your documentation and as japicmp will also report (no exclamation mark means indication that the change is not binary incompatible):

***  MODIFIED CLASS: PUBLIC japicmp.test.SubClass  (not serializable)
    ===  UNCHANGED CONSTRUCTOR: PUBLIC SubClass()
    ---  REMOVED METHOD: PUBLIC(-) void overriddenFromBaseClass()

Also worth mentioning is that japicmp is battle proven and downloaded more than 500 times each month.

metlos commented 9 years ago

Mea culpa about the overriden methods - I am not well acquainted to japicmp's reports.

I didn't express myself particularly well with overloaded method matching. What was meant by that is that what japicmp outputs as "method a(long) removed" + "method a(int) added", revapi outputs as "parameter type changed from long to int". Reporting on parameter type or number changes becomes tricky when you take into account overloaded methods (it becomes a matter of heuristic matching with possibly multiple solutions).

Revapi tries to go beyond binary compatibility - there are changes that are source compatible and binary incompatible and vice versa. In addition to that, there are certain changes that are ok source and binary compat wise but are "semantically" suspicious (changed value of a static final field with a primitive type for example).

Btw. do you know about Java library evolution puzzlers (https://bitbucket.org/jensdietrich/java-library-evolution-puzzlers/ and https://www.youtube.com/watch?v=qsgoxmeuB5U)? These were a great resource to me when working on revapi.

siom79 commented 9 years ago

Reporting that a parameter type has changed may be a nice feature, but when we take a look at the JLS, then it states in section 13.4.14:

"Changing the name of a method, or the type of a formal parameter to a method or constructor, or adding a parameter to or deleting a parameter from a method or constructor declaration creates a method or constructor with a new signature, and has the combined effect of deleting the method or constructor with the old signature and adding a method or constructor with the new signature (§13.4.12)."

Hence in general the approach to report a method as deleted and added is conform to the JLS and aligned with many other diff tools out there. Especially when it comes to binary compatibility, the effect that makes this change binary incompatible is the deletion of the old method signature as this is the one the old version is linked against. And as you already mentioned, it is not solvable without heuristics as there is no general solution. What will happen if you have these overloaded methods:

void m(Integer)
void m(Long)
void m(String)

and these change now to

void m(Object)

Which one has changed to the new one?

Source compatibility is a strong statement, but I guess that you know this blog post from the official Oracle blog. It states:

"Full source compatibility with any existing program is usually not achievable because of * imports."

Hence each compatibility check can only cover a subset of the possible source compatible changes.

jvanzyl commented 9 years ago

@siom79 Great, I will setup some infrastructure take a look at the your code and then maybe we can chat and figure out the best way to get this experiment up and running.

@metlos Happy to try other tools as well!

metlos commented 9 years ago

Source compatibility is a strong statement, but I guess that you know this blog post from the official Oracle blog. It states:

"Full source compatibility with any existing program is usually not achievable because of * imports."

Hence each compatibility check can only cover a subset of the possible source compatible changes.

* imports influence the compilability of user code, while the API checkers like yours or mine check the compiled code of the libraries that the user code uses. I.e. I can say to the user that library L2 contains no changes since L1 that would be source incompatible, but as you say that doesn't mean that user will be able to compile their code with L2 - but that's not a concern of L2, for sure. If the user sorts out their code to correctly use the classes from L2 they intend THEN my guarantees apply again.

Anyway, I didn't come here to compete or compare who's better - I simply wanted to offer choice and maybe food for thought.

I think japicmp and revapi have things to learn from each other and each will find its users. You said you have 500 downloads per month, I have roughly half of that. Frankly, I would be very happy if either of us had several orders of magnitude larger number - that would mean Java world started moving towards a better place, I believe.

siom79 commented 9 years ago

But to my understanding from this article, source compatibility is in contrast to binary compatibility concerned with whether your code still compiles against the new version:

"The most rudimentary kind of positive source compatibility is whether code that compiles against L1 will continue to compile against L2."

A simple example is for example the addition of a method to an interface. According to the JLS, section 15.3.1, this is a binary compatible change. But if you have implemented this interface from L1 in your code, it won't compile any longer against L2.

Yes, it would be great if compatibility concerns would get more importance, as it would ease every developers life.

jvanzyl commented 9 years ago

Maybe next week we can jump in a Google Hangout as I'd like to talk about the options for scanning Maven Central. Would it be possible, for example, to add a feature where the signature fingerprints for a particular version can be stored with that version? Concretely, say I'm looking at Guava 18.0 and nothing else but I want to capture it's fingerprint information so that I can later use it to compare it to Guava 19.0.

Right now there is no information stored in Maven Central related to binary compatibility and my first thought is that if we can generate the fingerprint information for every version of an artifact in Maven Central we can use that to compare any groupId:artifactId:version pair.

siom79 commented 9 years ago

Sure, we can have a call to discuss details.

The point with a fingerprint is that the class file already contains all information we need in order to compare two versions. With a fingerprint we would only get rid of the actual implementation.

Beyond the mere class file we also need implemented interfaces and extended classes. This is an open point I see, as we have to construct for each jar file the class path from the corresponding pom file. We can precompute that one. But as popular interfaces and superclasses are used in many libraries, we would duplicate that information within each fingerprint using the popular library.

On the other hand, comparing the versions 17.0 and 18.0 of the guava library takes on my machine 921ms (for 1690 classes, i.e. about 0.5ms for two classes). This is within a time frame where it would be possible to compute it on the fly. The result of the comparison can be stored as an xml file. For the guava library this has a size of 17.6KB (gzipped). Hence, for popular libraries the result could be cached and only for combinations requested not that frequently it would be computed on the fly.

jvanzyl commented 9 years ago

The concern on a machine that housed all of Maven Central is the CPU and I/O that might be required during spikes of requests. Now maybe doing the comparison of fingerprints might require the same so maybe it's a wash and you're right that computing on the fly with memoization of the results is fine.

siom79 commented 9 years ago

Would it be possible to implement a cache for frequent requests? Let's say that Google releases version 19.0 of guava. Most requests in the following time will be concerned with upgrades from 17.x or 18.x, I guess. Hence it could make sense to cache these results (maybe in a more storage efficient format than XML) and serve these requests from the cache. And if someone upgrades from a pretty old version to 19.0, then the request gets computed on the fly.

jvanzyl commented 9 years ago

Absolutely, we can implement anything we choose. We have all the machinery we need as it's being provided by Google as part of the Maven Central hosting.