Open jensdietrich opened 3 months ago
@nkiru-ede @ulizue here is my analysis (note that my graphs are a bit different as I exclude self-edges). Intervals are open on the left and closed to the right, i.e. 5..10 means >5 and <=10. This looks much more like what I had expected.
@nkiru-ede please sample your data - i.e. take some random GAVs and then manually compute the indegrees (number f dependents).
vertex count: 1,855,689 edge count: 8,751,900 in-degree analysis 0: 981506 0..2: 572268 2..3: 78467 3..4: 49605 4..5: 30742 5..10: 66630 10..50: 59718 50..100: 8051 100..500: 6800 500..1000: 997 1000..5000: 771 5000..10000: 87 10000..50000: 47
50000: 0
@jensdietrich I have reviewed this. However, I am getting different results from what you have. your table showed that there are 981, 506 GAVs with 0 dependencies.
Below is what I have:
Dependency Distribution of GAV: count [0, 1) 0 [1, 3) 694362 [3, 6) 203927 [6, 11) 85103 [11, 51) 70102 [51, 101) 8893 [101, 1000) 8587
@nkiru-ede so acc. to your numbers there are no artifacts without dependents. How about this:
org.apache.directory.shared:shared-ldap-client-all:1.0.0-M8
. release_all.tsv
)links_all.tsv
- line 283: "org.apache.directory.shared:shared-ldap-client-all:1.0.0-M8","org.slf4j:slf4j-api:1.6.1","Compile"
org.slf4j:slf4j-api:1.6.1
, i.e. org.apache.directory.shared:shared-ldap-client-all:1.0.0-M8
has no dependents and therefore the value for [0, 1)
cannot be null! Those are very easy to find, you just need to sample the data.
cc @ulizue
@jensdietrich
Dependency Distribution of GAV: [0, 1) 893457 [1, 3) 695087 [3, 6) 203617 [6, 11) 84859 [11, 51) 69969 [51, 101) 8878 [101, 1000) 8564 [1000, 5000) 806
Initially, I was checking the edges with the links_all dataset which is the dataset that contains and maps out the relationship between source(dependant) and target(main*). If you are checking scenarios/dependencies of source, is this not a case of the data not being available in the dataset or you are saying all artifacts in the dataset are supposed to have dependants?
sorry @nkiru-ede I am not following - can you please rephrase ? What do you mean by main here ? The latest numbers are reasonable, but I would still like to understand what you have changed / did before to ensure that they are not "accidentally about right". cc @ulizue
@jensdietrich I mean that initially I was just checking for dependents of the target artifacts and this results to 0 artifacts with 0 dependencies according to the edge/dependencies dataset(links_all).
The new chart is from when I take into account the dependencies of the entire artifact in the dataset(source and target)
@nkiru-ede ok lets discuss Friday when we meet, I still dont get it. Is it that you first ignored vertices without edges ?
The following is not-intuitive -- we expected most GAVS to have a very low number of dependents (see also GINI courses) - needs additional QA !
@jensdietrich to reproduce data