Closed mpadge closed 2 years ago
It would also be extremely useful to aggregate a parallel table of actual function calls from each package. The easiest way would be to dump every single external_calls
table, and so fulfil #15. Tables for each package could be individually dumped, the references between packages extracted from current compacted external_calls
data, then any cross-referenced packages for any particular package identified and loaded.
These are all now possible - the external_calls
data via an additional save_ex_calls
param to pkgstats_from_archive
, and the instability measure via post-processing code implemented elsewhere for the analyses of these data.
Re-opening because the external calls data are only from within ./R
files, and so ignore couplings between /src
and /inst
files.
The summary of external calls contains all data necessary to calculate the couplings, and thus the instability. This external call summary contains comma-delimited sequences of
<pkg>:<n_total>:<n_unique>
calls to all external packages. These are easily converted to a singledata.frame
for each package, and thus to a single object for all packages. Values can then be calculated for afferent coupling as the total number of calls from all other packages to a given package (excluding that package), and for efferent coupling as sum of all calls outside nominated package.Both of these metrics can be calculated both from overall numbers of calls, and numbers of unique calls only, giving
afferent_total
,afferent_unique
,efferent_total
, andefferent_unique
. Those can then be converted into the two corresponding measures of coupling instability, which should be calculated excluding all base and recommended packages.Perhaps the most difficult aspects will be that each measure for a given package will need to include those versions of all other packages which were on CRAN at the specified time for that package version. This complex filtering operation will likely be the most time-consuming bit of these calculations. The whole thing will have to be done as a post-processing routine on the raw data, appending the 6 extra columns:
coupling_afferent_total
coupling_afferent_unique
coupling_efferent_total
coupling_efferent_unique
coupling_instability_total
coupling_instability_unique
Those measures should capture the bulk of the useful information contained in these external call summaries.