larsbuntemeyer / notebooks

MIT License
0 stars 0 forks source link

CORDEX package seems to lack from environment.yml #2

Closed jwohland closed 1 year ago

jwohland commented 2 years ago

Hi Lars,

I am trying to run the climdex-euro-cordex.ipynb notebook. I used conda to create an environment based on the yaml file that you provide. Works like a charm until Data preparation where

from cordex import preprocessing as preproc

raises a

ModuleNotFoundError: No module named 'cordex'

What exactly is the module cordex? This: https://github.com/euro-cordex/py-cordex ?

Best, Jan

larsbuntemeyer commented 2 years ago

Yes, that was probably not in the environment file. You can simply install py-cordex using conda

conda install -c conda-forge py-cordex

It's more or less under development to tackle those issues in the notebook. I released a new version yesterday, that should be more or less stable.

larsbuntemeyer commented 2 years ago

I updated now the environment file, the time i created that, py-cordex wasn't available on conda. If something else is missing, let me know!

jwohland commented 2 years ago

Perfect, thanks. Will do!

jwohland commented 2 years ago

Looking at esgf.ipynb now, we are also missing pyesgf. Tried to add it manually using conda install -c conda-forge esgf-pyclient but this doesn't seem to properly include all dependencies. Imports

import pyesgf  
from pyesgf.logon import LogonManager  
from pyesgf.search import SearchConnection

work now but instantiating the LogonManager

lm = LogonManager()

doesnt:

ImportError: pyesgf.logon requires MyProxyClient

larsbuntemeyer commented 2 years ago

yes, you would have to install those dependencies yourself if you want to use pyesgf. MyProxyClient simply doensn't seem to be a required dependencey for them (only optional). The environment file in this repo more or less should install basics. I guess, we could have several environment files in this repo depending on certain topics.

larsbuntemeyer commented 2 years ago

You should also take care not to install too many dependencies in a single environment, which can make it slow to solve.

jwohland commented 2 years ago

Ok, but it would be neat if the provided environment is sufficient for the provided notebooks. Agree that multiple environments could be a good way of solving this.

In my opinion, the problem is that some of the chosen namings are non-intuitive, making it difficult to infer which package to install if one tries to do it manually on one's own. Example: the eurocordex notebook says import intake. One could assume that conda install intake would add the required dependency. However, it needs conda install intake-esm (which then uses the same import statement). These kind of things are difficult to infer from the notebooks themselves, making it more difficult to reuse them. (This example is just illustrative because intake-esm is specified in environment.yaml but other similar cases may well exist)

Don't get me wrong: these notebooks are a great resource! I'm just saying that they could be even more fantastic with complete environments.

larsbuntemeyer commented 2 years ago

Yes, you are right, i might have to clean up! Maybe, we could

larsbuntemeyer commented 2 years ago

if you already installed everything in one environment that you are happy with, i am happy if you want to contribute that! I would recommend exporting the environment using --from-history (that would be option iii)

jwohland commented 2 years ago

iii. would involve the least effort for users and make it really easy to play around with all of the notebooks. Unsure if environment would become too big. It took around ~1.5h to create the current one on a login node (maybe I should have done that on prepost).

ii. seems nice as well

Unsure what the advantage of i over ii would be.

jwohland commented 2 years ago

I don't have a iii-type environment ready. Currently only look at this on the side and have only explored 2 notebooks. Could continue adding to my current env and report back once I've managed to go through most of them.? Will take a while but this doesn't seem urgent either

larsbuntemeyer commented 2 years ago

iii. would involve the least effort for users and make it really easy to play around with all of the notebooks. Unsure if environment would become too big. It took around ~1.5h to create the current one on a login node (maybe I should have done that on prepost).

ii. seems nice as well

Unsure what the advantage of i over ii would be.

Yes, in my experience, conda is especially slow on mistral (probably due to filesystem). mamba sometimes threw errors due to the outdated c libraries of mistrals redhat os. so i guess, we might also shift this until levante is ready to use.