schochastics / CRAN_collaboration

Analysing the collaboration graph of R package developers on CRAN
8 stars 1 forks source link

CRAN collaboration graph

The CRAN collaboration graph consists of R package developers who are connected if they appear together as authors of an R package in the DESCRIPTION file.

The graph consists of 15419 R developers and 126,988 collaborative ties.

Six Degrees of Hadley Wickham

If you are familiar with the Erdős number number and/or the Bacon number then you know where this is going. The “Hadley number” is defined as the distance of R developers to Hadley Wickham in the collaboration graph. Someone (“A”) who contributed to a package that Hadley has contributed to has a Hadley number of 1. Someone who contributed to a package that A has contributed to but not Hadley has Hadley number 2, and so on. Hadley himself is the only person with Hadley number 0

The distribution of Hadley numbres is shown below.

The average Hadley number is 2.982. There is no easy way of checking your own Hadley number yet. For now, you can download processed_data/coauthor-biggest_comp.RDS and do

library(igraph)
g <- readRDS("coauthor-biggest_comp.RDS")
me <- "David Schoch"
idx <- which(V(g)$name==me)
V(g)$dist2HW[idx]

The center of the collaboration network

The center of the collaboration network is defined as the developer who’s average distance to all other developers is the lowest. The top ten developers in that regard are shown below.

name central
Hadley Wickham 2.98178
Ben Bolker 3.10481
Dirk Eddelbuettel 3.13269
Martin Maechler 3.17355
Romain Francois 3.17375
Michael Friendly 3.18030
R Core Team 3.19534
Jim Hester 3.20585
Posit Software 3.21376
Kevin Ushey 3.23419

TidyTuesday 38/2023

This repository was featured on TidyTuesday. My contribution can be found in the typst-poster folder. I created a typst poster using the pre-release version of Quarto which supports typst. I was inspired by a toot of Carlos Scheidegger.

Disclaimer

The repository only includes the “largest connected component” of the collaboration graph. Developers who have single authored one package do not appear in the graph.

The author field in the DESCRIPTION file can be very messy. I have a very lengthy cleaning script (see Rscripts/helpers.R and data/delete_authors.txt) but the final data is for sure not yet free of errors.