Open jensdietrich opened 5 months ago
@jensdietrich out of the 9, 715, 669 explicit dependencies, there are 949,995 cases where the target release date is greater than the source release date. I think we noted this quite earlier on in the project when we noted that some of the dependencies don't match what is on the repo. I emailed the authors at the time and this was their response (attached)
What I am noticing now is that the release dates on the Maven repository differ from what is in the dataset for those cases and these are mostly cases where the release dates are after the datasets have been collected (September 6, 2018). There are 944,440 out of the 949,995 cases (where the target release dates are greater than the source release date) with source release dates less than 6, September 2018. What this means is that for these 944, 400 artifacts, the release dates recorded in the maven repo may be potentially different from what is in the dataset and may be after the dataset was collected.
For the cases in your log for instance warning: src timestamp after target timestamp: org.apache.oodt:cas-product:1.2.5 -> org.apache.oodt:cas-filemgr:1.2.5 [Compile] src timestamp: 2016-08-10T15:08:35Z[GMT] target timestamp: 2018-09-06T20:02:50Z[GMT]
The release date of org.apache.oodt:cas-product:1.2.5 on the maven repo is Sep 13, 2018 while the release date of the target artifact org.apache.oodt:cas-filemgr:1.2.5 on the maven repo is Sep 13, 2018 (this is almost correct/close)
warning: src timestamp after target timestamp: org.apache.maven.scm:maven-scm-providers:1.11.1 -> org.codehaus.plexus:plexus-utils:3.1.0 [Compile] src timestamp: 2016-08-10T15:08:35Z[GMT] target timestamp: 2017-08-03T18:00:32Z[GMT]
The release date of the source - org.apache.maven.scm:maven-scm-providers:1.11.1 is Sep 15, 2018, while the release date of the target - org.codehaus.plexus:plexus-utils:3.1.0 is Aug 03, 2017 (this is correct)
warning: src timestamp after target timestamp: org.hibernate:hibernate-search-jbossmodules-elasticsearch-aws:5.10.4.Final -> org.hibernate:hibernate-search-jbossmodules-elasticsearch:5.10.4.Final [Compile] src timestamp: 2016-08-10T15:08:35Z[GMT] target timestamp: 2018-09-10T15:08:15Z[GMT]
The release date of the source - org.hibernate:hibernate-search-jbossmodules-elasticsearch-aws:5.10.4.Final on the maven repo is Sep 13, 2018, while the release date of the target - org.hibernate:hibernate-search-jbossmodules-elasticsearch:5.10.4.Final is Sep 13, 2018 (this is almost correct/close)
warning: src timestamp after target timestamp: org.uberfire:showcase-distribution-wars:2.8.0.Final -> com.sun.xml.bind:jaxb-impl:2.3.0 [Compile] src timestamp: 2016-08-10T15:08:35Z[GMT] target timestamp: 2017-08-02T15:22:46Z[GMT]
The release date of the source - org.uberfire:showcase-distribution-wars:2.8.0.Final on the maven repo is Sep 12, 2018, while the release date of the target - com.sun.xml.bind:jaxb-impl:2.3.0 is Aug 02, 2017 (this is correct)
This date issues are not mentioned in the Benellalam paper. I am checking other papers that used the dataset.
@nkiru-ede We still need to come up with an explanation why this is the case. Possible starting points:
This requires some detective work, perhaps even try asking a question on stackoverflow
Discussed timestamps in meeting -- look at: https://mvnrepository.com/artifact/org.apache.oodt/cas-product/1.2.5. and https://mvnrepository.com/artifact/org.apache.oodt/cas-filemgr/1.2.5 . Sometimes timestamps are different from dataset.
TODO: sample some data that shows mismatches, as table with the following columns:
@jensdietrich I really can't find a pattern. But often times, I have seen that the release dates in the 2018 dataset match the ones in libraries.io website more than they do the ones in Maven central.
I have attached an a link to an excel sheet with some samples (still updating this sheet)
@jensdietrich There are 458, 038 cases where the source release date is before the target, in 223, 585 of these cases, the earliest release of the artifact is before the dependency, whereas in 234, 453 of the cases, the earliest release is after the dependency
@jensdietrich Answering the question - How often do artifacts depend on artifacts depend on different versions of the same components: Image below shows the frequency distribution (for direct dependencies, I will do for Transitive dependencies)
@jensdietrich
I made a version comparison, specifically for the cases where source release dates are earlier than the oldest version of targets' Attached is the zipped file false_cases_version_comparison.zip
In 52% of the cases, the source version is equal to the target version Dependency < Artifact 17% Dependency > Artifact 15.8% Invalid Version 14.7% (these are mostly cases where one or both target and source versions are non numerical)
A further version change/composition analysis revealed this:
Version Changes Analysis: Major changes: 85067 (36.28%) Minor changes: 94084 (40.13%) Patch changes: 92973 (39.66%)
There seem to be multiple rows where targets have release timestamps after sources. We need an explanation for this -- @nkiru-ede please check discussions in literature / grey literature / primary data. You can also email the authors of the dataset paper if we don't find anything else as last resort.
There is a pattern I can see that sometimes there are synchronised releases (usually within the same group), and source/target are released at the same time, but the timestamp of target ends up being after source. I have already filtered those out -- setting a 1h threshold.
The problem is rather common: I have counted 484,750 dependencies where this is the case ! So there seems to be a valid use case.
Here are a few records: