greenelab / library-access

Collecting data on whether library access to scholarly literature
Other
5 stars 3 forks source link

Specifying the computational environment #6

Closed dhimmel closed 6 years ago

dhimmel commented 6 years ago

@publicus you may want to create a small PR to specify the computational environment. You can also specify it with your first code-containing PR. The idea is that we want to make sure the code runs today and in a year from now. And specifying the version of all your software is important towards those ends.

I'd recommend conda, like we're using for greenelab/scihub. Conda environments support both Python and R. They can be a pain in the ass and sometimes break, so don't be shy.

jglev commented 6 years ago

Thanks, @dhimmel! Conda took a bit of getting used to, but I think I've got the hang of it now. It seems similar overall to virtualenv, which I'm used to using. I've used conda env export > environment.yml following the Conda documentation to create a YAML list of packages, and will include it in my next in-progress PR.

I've been using another project with a lot of conceptual overlap with this project to get going again with Python, and am diving fully into this now, having re-familiarized myself with the language and workflow.

dhimmel commented 6 years ago

I've used conda env export > environment.yml following the Conda documentation to create a YAML list of packages, and will include it in my next in-progress PR.

Cool, note that environment specs created using conda env export will generally only work on the OS which they're created. One main issue is that they include build numbers. In addition, they include many dependent packages. One thing I do to improve OS interoperability is to remove build numbers and only specify packages that I explicitly import or use.

jglev commented 6 years ago

That's good to know, thanks! Is there a way to do that from within Conda, or do you write something novel to strip build numbers (and the rest)?

On that note, taking, for example, sqlalchemy=1.1.9=py36_0, is the build number you're referring to py36_0?

dhimmel commented 6 years ago

The format is channel::package=version=build. I'd omit =build but specify channel, which will usually be anaconda or conda-forge. By default, the packages will all be from the anaconda channel.

is the build number you're referring to py36_0

yep

Is there a way to do that from within Conda

No, it's an area where conda doesn't document the issue well and has poor solutions. Usually, I just build my environment.yml file manually, one package at a time, rather than working backwards from an export.

jglev commented 6 years ago

Cheers! I appreciate your advice.

dhimmel commented 6 years ago

environment.yml added in https://github.com/greenelab/library-access/pull/7