Tilburg Science Hub is an open-source resource repository that supports students and researchers in the social sciences to efficiently manage data- and computation-intensive projects.
When using the same code across different machines or on a cluster, I sometimes run into issues that stem from having different package versions installed (e.g. in R or Julia). If a package updates its functionality, sometimes it breaks a few things or numerical differences happen due to changes in the underlying algorithms.
One way to avoid these issues is to specify software environments for each project that can just be loaded and install everything required with the right versions. This ensures that even when the packages get updated, the code still can be run as the environment is always set up with exactly the same packages. This ensures also replicability for other authors in the future.
As Anaconda is already used on the page and does provide functions to save and restore such environments for both R and Python, it might be worth adding some documentation on how to use this in Anaconda and why it's important to keep track of the different packages used (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). As a sidenote in case you're not already aware of this, these environments can also be useful for teaching as they ensure students have the exact same setup and are not missing packages etc.
What do you think? I could write something up in the coming weeks, but wanted to check whether you think it's worth it before investing time.
We think this is a valid point and worth sharing on Tilburg Science Hub. Please, let us know if you're still available for writing an article about this.
Hi @hannesdatta,
When using the same code across different machines or on a cluster, I sometimes run into issues that stem from having different package versions installed (e.g. in R or Julia). If a package updates its functionality, sometimes it breaks a few things or numerical differences happen due to changes in the underlying algorithms.
One way to avoid these issues is to specify software environments for each project that can just be loaded and install everything required with the right versions. This ensures that even when the packages get updated, the code still can be run as the environment is always set up with exactly the same packages. This ensures also replicability for other authors in the future.
As Anaconda is already used on the page and does provide functions to save and restore such environments for both R and Python, it might be worth adding some documentation on how to use this in Anaconda and why it's important to keep track of the different packages used (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). As a sidenote in case you're not already aware of this, these environments can also be useful for teaching as they ensure students have the exact same setup and are not missing packages etc.
What do you think? I could write something up in the coming weeks, but wanted to check whether you think it's worth it before investing time.
Best, Rafael