hafen / rminiconda

Install and maintain isolated miniconda Python installations from R
Apache License 2.0
65 stars 3 forks source link

install_miniconda masked by reticulate #10

Open xhdong-umd opened 4 years ago

xhdong-umd commented 4 years ago

I'm very excited to try rminiconda after I read the news from reticulate, hoping to be able to make python config as transparent as possible to my R package users.

One thing I noticed is that loading reticulate will mask rminiconda::install_miniconda. I think this may not be optimal because reticulate will be almost always be used in R/python env, having two important functions at same name but different implementation might cause problem. It's better to change it before rminiconda spread out usage and have more code depend on it (and I believe more and more packages will use this).

Alternatively, the ideal solution will be consolidate conda related functions in reticulate/rminiconda into one place(either in reticulate or rminiconda), because

@hafen , @kevinushey do you think it will be better to either abstract it a little bit (with more control options like name, package list), or better yet, consolidate them into one set of functions for reticulate/rminiconda for miniconda part?

kevinushey commented 4 years ago

One thing I noticed is that loading reticulate will mask rminiconda::install_miniconda. I think this may not be optimal because reticulate will be almost always be used in R/python env, having two important functions at same name but different implementation might cause problem.

IMHO this is better solved by just explicitly using e.g. rminiconda::install_miniconda() if so required.

  • reticulate will add RETICULATE_MINICONDA_PATH, rminiconda will add R_MINICONDA_PATH. It's obviously better if we only maintain one of them.

This sounds worth doing to me.

  • reticulate::install_miniconda will install conda with a default name r-reticulate. Compare to rminiconda, I think this is one step backward.

Can you explain why this is a step backward? Note that the goal here is to ensure that users of reticulate who don't explicitly request a specific Python version get a 'standardized' version, and so we've decided to go with a Miniconda environment containing numpy at a minimum.

xhdong-umd commented 4 years ago

@kevinushey, Thanks for the quick reply, I was a little bit too excited and posted in a hurry, later I realized there might be some confusion in my side.

My goals are very much same with what you mentioned in recent reticulate News: python package management can be a pain, so( in order of importance)

  1. R package user can use python library with minimal effort, don't need to go outside of R to touch windows registry, system settings, system environment... Previously this is hard because multiple platforms usually always need some settings. rminiconda solved this by using good default settings which should work by default, just put conda installation in fixed location or register R system environment.
    • There are still a little bit maintenance tasks left, like upgrade conda and uninstall completely (which can give user confidence in trying out, also useful when solving conflicts)
  2. still allow advanced users have more controls
    • customize install location
    • allow using existing conda or anaconda. This means to register existing conda installation to rminiconda/reticulate. I used "register" because user should provide the path(advanced user can do this) instead of let package to detect which is hard, and once registered it should work globally (reticulate use_conda have more flexibility in specific overrides)
  3. With 1,2 satisfied, try to improve efficiency. If I am understanding correctly, reticulate install one conda installation and manage conda environments (I avoided virtual env cases, since I want to focus on conda first as it's preferred way), while rminiconda basically install new conda installation for every new name, and there is no mention or direct support on conda environment. This has the benefit of total isolation, kind of docker images, but I think there's some spaces for improvements:
    • Separate conda installation is at least needed for different python/conda versions. However with same version of python and conda, there is no reason to use complete new conda installation in place of new conda environment. Different R packages need to coordinate on this any way if they want to share/reuse python env created by other or dependency packages. Here I think rminiconda should recommend to use conda and provide support for related tasks(some of them already exist in reticulate, just may need refactoring), and I think most users would expect to use conda to manage packages given the rminiconda name...
xhdong-umd commented 4 years ago

As for RETICULATE_MINICONDA_PATH vs R_MINICONDA_PATH, the latter is actually just the root of various conda installations, so they are serving different purposes and cannot be combined (maybe R_MINICONDA_PATH can use a less confusing name like R_MINICONDA_LIB.

Right now reticulate::miniconda_path only check RETICULATE_MINICONDA_PATH, so it should further check R_MINICONDA_PATH (then there could be multiple conda installations, this lead to next question). And miniconda_path_default is checking r-miniconda folder while the current version of rminiconda is rminiconda?

reticulate have some conda commands functions, I'm not sure how this play with the multiple rminiconda installations. It looks to me reticulate either auto detect and use the first one, or wait user to specify the location. This might need some helper functions to make specify rminiconda installations easier.

I'll be happy to create a PR for this in reticulate if needed.

xhdong-umd commented 4 years ago

I'm trying to use rminiconda to install miniconda, then use reticulate to create conda env, install packages with conda. This is not running smooth:

# I set the sys env so this point to the miniconda installation. gee is the name used in rminiconda
Sys.setenv(RETICULATE_MINICONDA_PATH = file.path(rminiconda::get_miniconda_path(), "gee"))
> reticulate:::miniconda_exists()
[1] TRUE
# update also works
reticulate::miniconda_update()
> reticulate::miniconda_path()
[1] "d:/Lib/rminiconda/gee"
# this can find the miniconda installation
> conda_list()
            name                                               python
1      Anaconda3                       D:\\Lib\\Anaconda3\\python.exe
2           py37           D:\\Lib\\Anaconda3\\envs\\py37\\python.exe
3   r-reticulate   D:\\Lib\\Anaconda3\\envs\\r-reticulate\\python.exe
4   r-tensorflow   D:\\Lib\\Anaconda3\\envs\\r-tensorflow\\python.exe
5 tensorflow_env D:\\Lib\\Anaconda3\\envs\\tensorflow_env\\python.exe
6   r-reticulate   d:\\Lib\\Anaconda3\\envs\\r-reticulate\\python.exe
7   r-tensorflow   d:\\Lib\\Anaconda3\\envs\\r-tensorflow\\python.exe
8            gee                 d:\\Lib\\rminiconda\\gee\\python.exe
# however I cannot pass the conda location to conda_list explicitly without problem
# it seemed that `conda` parameter in reticulate usage means conda exe location
> conda_list("d:/Lib/rminiconda/gee")
Error in system2(conda, args = c("info", "--json"), stdout = TRUE, stderr = FALSE) : 
  '"d:/Lib/rminiconda/gee"' not found
# but explicitly using conda exe doesn't work too
conda_bin_path <- file.path(rminiconda::get_miniconda_path(), "gee", "condabin")
> conda_list(conda_bin_path)
Error in system2(conda, args = c("info", "--json"), stdout = TRUE, stderr = FALSE) : 
  '"d:/Lib/rminiconda/gee/condabin"' not found

The last error seemed to be that reticulate is looking for bin folder adjacent to condabin, but there is no bin folder in my miniconda installation through rminiconda.

Actually reticulate:::miniconda_conda know that in windows it just look for condabin, but conda_* functions cannot recognize this... Maybe reticulate can centralize all the guessing and detecting code in one place, and make other functions just rely on detected or specified location. Right now it seemed various functions have their own detection in different cases, and miniconda is a little bit different from anaconda which might be the main use case of tensorflow.