mechmotum / ideas

Research ideas for the TU Delft Bicycle Laboratorium
0 stars 1 forks source link

How fast will my open source code break? #1

Open moorepants opened 5 years ago

moorepants commented 5 years ago

One of my biggest complaints about open source software is the fact that APIs do not remain stable. If I create a research paper using a software stack, publish, don't maintain it, and then come back ~1 year later it seems to take a day or more to update the software such that it can function with the updated dependencies. One year isn't that long of a time in a research world. This isn't good for reproducibility and I don't think we should have to shop a VM with a paper that freezes the entire stack. I've also noticed that my Matlab code that is 10+ years old tends to run just fine on new version, leading me to believe that Mathworks takes this much more seriously.

I'm interested in characterizing:

Hypothesis: On average a given script or software package that relies on a high level scientific computing software stack will break within a year due to unstable dependency APIs.

Prior art

Haven't found anything much yet.

Methods

Here is an idea for a method to do this:

  1. Download a package or script at the top of (or near top of) the stack and log its release date
  2. Install the dependencies specified at the time of release and ensure the software runs
  3. Increment the dependency versions in chronological order and test if the script/package still runs at every increment. You can detect whether is runs or not and also whether deprecation warnings are emitted. If a single dependency fails, you can then fix it at the last working version and then continue to increment the other until you get to the script's release date or all dependencies fail.
  4. Record the dates that your software gets deprecation warnings and fails.

Another method:

Track a code bases through git commits and somehow measure the frequency and time of depredations and removals.

We will have to find a reliable way to get old dependencies installed. This is often quite a painful process to simply get things installed as they were from some point in the past.

Another thought:

We could check how many tests of a prior version raise errors or deprecation warnings.

moorepants commented 3 years ago

I added this project idea here: https://mechmotum.github.io/jobs/msc/how-fast-will-open-source-break.html.

moorepants commented 3 years ago

A static analysis tool to identify deprecated Python code: https://github.com/QuantStack/memestra. Could be useful.