ropensci-review-tools / pkgstats

Historical statistics of every R package ever
https://docs.ropensci.org/pkgstats/
17 stars 1 forks source link

New metrics #32

Closed mpadge closed 2 years ago

mpadge commented 3 years ago

The summary of external calls contains all data necessary to calculate the couplings, and thus the instability. This external call summary contains comma-delimited sequences of <pkg>:<n_total>:<n_unique> calls to all external packages. These are easily converted to a single data.frame for each package, and thus to a single object for all packages. Values can then be calculated for afferent coupling as the total number of calls from all other packages to a given package (excluding that package), and for efferent coupling as sum of all calls outside nominated package.

Both of these metrics can be calculated both from overall numbers of calls, and numbers of unique calls only, giving afferent_total, afferent_unique, efferent_total, and efferent_unique. Those can then be converted into the two corresponding measures of coupling instability, which should be calculated excluding all base and recommended packages.

Perhaps the most difficult aspects will be that each measure for a given package will need to include those versions of all other packages which were on CRAN at the specified time for that package version. This complex filtering operation will likely be the most time-consuming bit of these calculations. The whole thing will have to be done as a post-processing routine on the raw data, appending the 6 extra columns:

Those measures should capture the bulk of the useful information contained in these external call summaries.

mpadge commented 2 years ago

It would also be extremely useful to aggregate a parallel table of actual function calls from each package. The easiest way would be to dump every single external_calls table, and so fulfil #15. Tables for each package could be individually dumped, the references between packages extracted from current compacted external_calls data, then any cross-referenced packages for any particular package identified and loaded.

mpadge commented 2 years ago

These are all now possible - the external_calls data via an additional save_ex_calls param to pkgstats_from_archive, and the instability measure via post-processing code implemented elsewhere for the analyses of these data.

mpadge commented 2 years ago

Re-opening because the external calls data are only from within ./R files, and so ignore couplings between /src and /inst files.