Open jensdietrich opened 3 months ago
@ulizue fyi
@jensdietrich done. charts are included in figures folder - GA_by_year_stats
@nkiru-ede thanks -- did you write test cases ? if not please do and add the link to the test here as well
@jensdietrich test/TestGA2Compute.py
@nkiru-ede the structure of the tests still has issues -- you basically embed a copy of the computation in the test script (compute_counts(df)
) but this should be in the main script that computes the stats, and tests just reference (instead of cloning) this. Cloning causes issues when you start changing the scripts but forget to update the copies, then your tests give you wrong results.
You could describe your code as WET as opposed to DRY (see https://en.wikipedia.org/wiki/Don%27t_repeat_yourself what this means).
@jensdietrich done
thanks @nkiru-ede !
@nkiru-ede as discussed today also add issues where we observed the last release in this year. We don't need to display data for the last two years as this might be heavily biased as we discussed today. Re-opening the issue might be easier than opening a new one.
@jensdietrich this is done. Have updated the images on the overleaf document. I also included charts where i included the last two years for discussion during the next meeting
thanks @nkiru-ede (also @ulizue please have a look at overleaf). @nkiru-ede - did you test this ? The numbers look very high, this would mean that new GAs are introduced almost at the same rate as they are abandoned. Or does the log scale misrepresent this a bit ? Could your perhaps email us versions with a linear scale ? Just to see the difference. But if we confirm this, this is a really interesting insight IMO.
I suggest to also study the number of GAVs released per year and present this in boxcharts. We can do this for all GAs, and for the top GAs (say top-100 as discussed). This would provide us with some interesting data about the correlation between maintenance and popularity.
For the boxcharts we now have a separate issue #17 .
The other data in this chart:
@nkiru-ede Could you please take a snapshot for one year (say 2010) with the three lists of GAs which create the three datapoints for this year. Just a plain file list with one GA (not GAV) per year will do. This is the final QA step here.
@jensdietrich please review the attached.
For GAs, create two timelines (GA counts by year), based on different ways to calculate the timestamp for a GA.
This gives us some interesting data to see maintenance activities.
Include unit tests for the calculation , and create the respective charts (two curves in one chart)