track code license over time

ltalirz commented 2 years ago

For an accurate historical analysis of license models, one should record the license of the individual codes as a function of time, rather than just stating the license that a code has today.

ltalirz commented 2 years ago

In principle, all metadata can evolve over time - we've already encountered the case for time-dependent query strings https://github.com/ltalirz/atomistic-software/issues/114#issuecomment-1097488391

At the same time, recording time-dependent metadata will remain an exception and be introduced only where necessary (most metadata won't change over time).

This could be modeled by a schema like:

"code-name": {
  "query_string": "val",
  "license": "val",
  "updates": {
    "2020": {
      "license": "val-new",
      "query_string": "val-new"
    }
  }
}

where the value for 2021 would be obtained by recursively updating the top-level dictionary of the code with changes from all relevant years from the "evolution" key.

This probably means we should then pre-build the "rendered" version of the code metadata for each year. This is not a big deal though - the current file has 40KB of text, so we'll be adding < 40KB*12 of memory.

Since the citation data for year X is retrospective, the changes for year X should reflect the metadata at the beginning of year X.

ltalirz commented 1 year ago

There have been a number of license changes since the list started in 2021: at least molcas, castep, amber, gromos, cpmd. I suspect a few more might have changed from 2010-2021.

Edit: DIRAC switched to LGPL

Since we only record the latest license, we currently misrepresent the license trends at the ecosystem level (in particular, free/open licenses are actually growing more strongly than the current graphs would show).

It would be very nice to fix this.

ltalirz / atomistic-software

track code license over time #110