pharmaR / riskmetric

Metrics to evaluate the risk of R packages
https://pharmar.github.io/riskmetric/
Other
156 stars 29 forks source link

Proposed metric: `download_trend_2yr` #307

Open AARON-CLARK opened 11 months ago

AARON-CLARK commented 11 months ago

This proposed metric actually originated from the {riskassessment} app's repo: https://github.com/pharmaR/riskassessment/issues/438

I want to propose a new card that fits a linear model to the monthly data produced by generate_comm_data() with the goal of reporting on the slope coefficient. For example, below you can see that the stringr package has been increasing in downloads over the last two years. Let's say we generate a linear model on this data and the slope is 27k. In this context, we could highlight two things:

First, the the trend is positive/ upwards! Second, that the average increase in pkg downloads increases by 27k per month. I think those are really handy statistics to know! Maybe we add a button to overlay the linear model on the plot? We probably want to limit the model fit to the last two years to keep the stats recent... but I guess we could always empower the user to choose the timeframe. On the lower bound, we shouldn't fit a model unless a minimum of 12 months exists.

This idea is currently being tested in a dev version of the {riskassessment} app, deployed on shinyapps.io: https://rinpharma.shinyapps.io/riskassessment_v100_dev/

Note, the app's database is pre-loaded with about 300 tidyverse / pharmaverse/ and other top downloaded pkgs from CRAN. Please feel free to open the app and log in using the example credentials provided on the home screen. The visual can be found @ Risk Assessment tab > Package Metrics tab > Community Usage page:

image

Right now, the trend line only shows up when there is >= 12 months of history on CRAN, but it won't use greater than 2 yrs worth of data, because anything longer doesn't really reflect "recent" usage/ popularity. To score, I think you'd have to evaluate the slope coefficient as a percent of the total downloads, since a slope of 5k downloads may be high for a pharmaverse pkg but really low for a tidyverse pkg. Here's an example of a package with a downward trend:

image

Open to hear ideas you all have too, like if doing a weekly trend over 1 year makes more sense? ... because it gives us more data points and is more recent. Thanks!