sararselitsky / FastPG

Fast phenograph, CyTOF
Other
25 stars 6 forks source link

Improve documentation for python dependencies #6

Closed jefferys closed 4 years ago

jefferys commented 4 years ago

The documentation on how FastPG depends on the nmslib library and on python to access it could be improved. Installing nmslib via the python package manager can be problematic depending on a user's python environment and R users are not necessarily python users. Perhaps integrating the install information from nmslib, nmslibR, and reticulate in a more comprehensive way would allow for easier user troubleshooting when things get complicated.

jefferys commented 4 years ago

FastPG uses the nmslibR package to provide the hnsw step. nmslibR in turn relies on an external library, nmslib, and accesses it via python bindings using the reticulate package. The nmslib library and its python bindings must be installed from outside R, e.g. by using the python package manager.

Setting up nmslib and connecting to it through python via reticulate can be complicated because there are many ways python can be set up. Installations vary by OS, by python version (we really should stop talking about python 2), and by environment.

jefferys commented 4 years ago

One detail is the reticulate package won't necessarily pick the correct default python to run. A user might have multiple pythons installed, and the python run must be able to find the nmslib python package. To tell it which python to use, we recommend setting the RETICULATE_PYTHON environmental variable. However, some guidance on how to identify what to set this environmental variable to might be useful. From a user perspective, they might not know how their system is set up. Major use cases might include: a system python, a user space python, a virtual-env python, and a conda/anaconda python. Unfortunately, there is more than one virtual environment implementation also.