onedata / fs-onedatafs

OnedataFS is a PyFilesystem interface to Onedata virtual file system
MIT License
0 stars 1 forks source link

Preferred way to install onedatafs #5

Open bgruening opened 3 years ago

bgruening commented 3 years ago

Hi,

we would like to use and push the usage of onedata in our community. For that, we wanted to integrate fs.onedatafs into our system. However, it is not clear what the preferred way is to install this library.

Installing it via PyPI leads to the following error on python3.8 but also on python3.7.

Python 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fs.onedatafs
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bag/miniconda3/envs/python37/lib/python3.7/site-packages/fs/onedatafs/__init__.py", line 12, in <module>
    from ._onedatafs import OnedataFS, OnedataSubFS # noqa
  File "/home/bag/miniconda3/envs/python37/lib/python3.7/site-packages/fs/onedatafs/_onedatafs.py", line 36, in <module>
    import onedatafs # noqa
ModuleNotFoundError: No module named 'onedatafs'

Is python>3.7 supported? The PyPI release is really outdated and the GitHub release is much newer. Is there a reason? We tried the latest github release but we got the same error.

Next, we tried the anaconda release, but this does not work because you don't pin protobuf or libtbb so this release is also not useable out of the box. I'm part of the conda-forge and bioconda community so we could help with the correct setup of conda builds if you like.

My general concern is that this project does not seem to be very active :( no issues, no PRs, the master branch is not updated and no CI infrastructure that runs tests. Is there any other preferred way to access OneData via python that I missed?

Thanks for OneData we like it a lot and would more tightly integrate with it but we would need to python library for that. Bjoern

bkryza commented 3 years ago

@bgruening Thanks for your interest!

First of all, the activity on our GitHub repositories is low due to the fact that the main development in Onedata happens through our self-hosted Jira+Bamboo suite, where we can run more comprehensive CI tests than on public CI platforms, and we simply push all release and develop branches daily to GitHub...

With respect to fs-onedatafs package specifically, this is just a Python wrapper over our C++ client, so this particular repository doesn't change too often, and the PyPI package was pushed just once for the sake of publishing docs, as the fs-onedatafs package on its own is not usable, you need to have preinstalled the C++ client and libraries.

Currently, the preferred way to install is through Conda, however I'm currently struggling in making it work on Python 3.8+ as the conda dependency resolver fails on the dependencies before even starting a build. As to the protobuf and libtbb they are pinned in the oneclient repository: https://github.com/onedata/oneclient/blob/release/20.02.7/conda/onedatafs/meta.yaml#L57-L64, at least since version 20.02.6. However, if you have any suggestions on how to improve our Conda packages that would be most welcome.

Another way is installation directly from distro packages, but this is only supported at the moment for Ubuntu Bionic, Ubuntu Xenial and CentOS 7 (using Software Collections environment), however please bear in mind that this installs quite a few dependencies, as we support by default several storage systems for which we need the client libraries (e.g. Ceph, S3, XRootd, etc...):

pip3 install fs
wget http://packages.onedata.org/oneclient-2002.sh
./oneclient-2002.sh python3-onedatafs

Also, if you would like to just simply test the fs-onedatafs you can start our Oneclient Docker image, where it is preinstalled:

❯ docker run --entrypoint /bin/bash -it onedata/oneclient:20.02.7
root@961e06e826b2:/tmp# python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from fs.onedatafs import OnedataFS
>>> 

Finally, we are currently in the process of rewriting the public documentation at onedata.org so hopefully in few weeks the docs will be more user friendly and up to date...

Please let us know if you have any more questions or comments.

bgruening commented 3 years ago

@bgruening Thanks for your interest!

First of all, the activity on our GitHub repositories is low due to the fact that the main development in Onedata happens through our self-hosted Jira+Bamboo suite, where we can run more comprehensive CI tests than on public CI platforms, and we simply push all release and develop branches daily to GitHub...

Ah I see. Maybe that can be added to the readme?

With respect to fs-onedatafs package specifically, this is just a Python wrapper over our C++ client, so this particular repository doesn't change too often, and the PyPI package was pushed just once for the sake of publishing docs, as the fs-onedatafs package on its own is not usable, you need to have preinstalled the C++ client and libraries.

You could create a python wheel that contains your C++ client. This way other projects could depend on your Python bindings. Those wheels can be built on public CI and pushed to PyPI with every GitHub release.

I think it is confusing for users that see your package on PyPI, which is not working out of the box and old. Maybe better to not offer a package than? However, I will try to convince you that a nice PyPI package is useful :)

Currently, the preferred way to install is through Conda, however I'm currently struggling in making it work on Python 3.8+ as the conda dependency resolver fails on the dependencies before even starting a build. As to the protobuf and libtbb they are pinned in the oneclient repository: https://github.com/onedata/oneclient/blob/release/20.02.7/conda/onedatafs/meta.yaml#L57-L64, at least since version 20.02.6. However, if you have any suggestions on how to improve our Conda packages that would be most welcome.

Do you think we can migrate those packages to (including the client) to conda-forge? This way we make sure that everything is consistent in the python-conda ecosystem. For example conda-forge pins the entire stack against a particular version of protobuf.

Another way is installation directly from distro packages, but this is only supported at the moment for Ubuntu Bionic, Ubuntu Xenial and CentOS 7

That is not really useful for us at the moment I think. For our project (galaxyproject.org) we would need PyPI packages. We could build them on our own (maybe), but we of course prefer upstream packages.

(using Software Collections environment), however, please bear in mind that this installs quite a few dependencies, as we support by default several storage systems for which we need the client libraries (e.g. Ceph, S3, XRootd, etc...):

Yeah, thats why we would like to use it and get EGI more tightly integrated into Galaxy.

pip3 install fs
wget http://packages.onedata.org/oneclient-2002.sh
./oneclient-2002.sh python3-onedatafs

Oh, that seems nice, we will try that. Any change you can use to create python wheels?

Also, if you would like to just simply test the fs-onedatafs you can start our Oneclient Docker image, where it is preinstalled:

❯ docker run --entrypoint /bin/bash -it onedata/oneclient:20.02.7
root@961e06e826b2:/tmp# python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from fs.onedatafs import OnedataFS
>>> 

Finally, we are currently in the process of rewriting the public documentation at onedata.org so hopefully in few weeks the docs will be more user friendly and up to date...

Top! Looking forward to it.

Please let us know if you have any more questions or comments.

My only question are, if you are ok if we create conda-forge packages for it and if the project is willing to support python manylinux wheels on PyPI :)

Thanks a lot for your answer! Bjoern

bkryza commented 3 years ago

@bgruening Thanks for the tips - we will look in the next weeks to check if it's possible to create a wheel package for OnedataFS and also try to port the conda recipes to conda-forge, and of course any support here would be welcome...

bgruening commented 3 years ago

Awesome! Please ping me if you need any help!

bgruening commented 3 years ago

@bkryza is there anything we can help here? It would be nice to give Galaxy users read-only access to EGI OneData :)

bkryza commented 3 years ago

@bgruening The main problem in our case is the substantial amount of dependencies not available in conda or conda-forge: https://github.com/onedata/oneclient/blob/develop/conda/onedatafs/meta.yaml#L19-L65

I've actually made few attempts to fix linking with Py 3.8 on Anaconda and also to add these dependencies as Git submodules to the project and build like that but so far it didn't work...

One more thing I will try this week is to try to use the CMake FetchContent mechanism to enable alternative compilation of Oneclient and OnedataFS which will download and build these dependencies (mainly Facebook and AWS libraries) as static libraries and if it succeeds I will try to submit a conda-forge PR...

bgruening commented 3 years ago

@bkryza do you have a list of those missing packages in conda-forge. I can try to get them in.

bkryza commented 3 years ago

@bgruening The most critical are:

Optionally, we would need storage driver libraries which allow Oneclient/OnedataFS direct access to storage when possible:

So the bottomline is - if we had the FB libraries available or were able to build them in place during building of our packages - we could provide a first version with direct access to only selected storages which are covered by available libraries and then add new libraries if requested by users...

The question is - do you think it's viable to add the FB libraries in such old versions to conda-forge? - if not I will try to enable building them statically during compilation of our packages...

bgruening commented 3 years ago

The question is - do you think it's viable to add the FB libraries in such old versions to conda-forge? - if not I will try to enable building them statically during compilation of our packages...

old libraries might be a problem :( indeed. Not sure what can of worms this is opening up.

@bkryza as a general question, is it possible to deactivate certain features during compile time? We could not include facebook-stuff in the first version and return a run-time warning if people will use it. Not ideal, but it brings us forward.

bkryza commented 3 years ago

@bgruening We can disable different storages support using CMake flags - unfortunately the FB C++ libraries are critical as our core async code is built around them, and unfortunately they change their API quite often and we don't have time to update the code everytime...

I will try to work around it this week by adding a CMake flag which will fetch and build them during compilation...

bgruening commented 3 years ago

Upps, I see. Thanks a lot @bkryza!

bgruening commented 2 years ago

@bkryza any update here? We will ship the next Galaxy release unfortunately again without OneData support :(

bkryza commented 2 years ago

@bgruening (CC: @luman75) Actually there has been some progress, we've managed to get the latest stable version - 20.02.15 - to build and install (but only for Python 3.9). Most of the dependencies are now from conda-forge, but still few are needed from our channel onedata. I've just tested it to be sure on a fresh Miniconda install and it works, assuming your .condarc looks like this:

channels:
  - conda-forge
  - onedata
  - defaults

then you can install fs.onedata and oneclient:

conda install fs.onedatafs=20.02.15
conda install oneclient=20.02.15

You can see the deps for oneclient here:

and for fs.onedatafs here: