mottosso / bleeding-rez

Rez - Reproducible software environments for Windows, Linux and MacOS
GNU Lesser General Public License v3.0
71 stars 10 forks source link

Small correction in README about Conda behavior #100

Open darkvertex opened 3 years ago

darkvertex commented 3 years ago

Hey! 👋

I was reading the README and I found this bit slightly misleading:

Aside from those being strictly limited to Python packages and Rez being language agnostic, both venv and Conda couple the installation of packages with their environment, meaning installed packages cannot be shared with other environments.

I've used both the original Rez and Conda and in the interest of accuracy I feel compelled to clarify that:

To be clear, I'm not bashing Rez or bleeding-rez, just wish for correct comparisons. That is all. Hope you understand.

mottosso commented 3 years ago

Hey @darkvertex, been a while, hope all is well. :)

Conda is not strictly for Python packages.

Hadn't noticed Conda had grown beyond just Python, that's neat.

Conda is language and OS agnostic, just like bleeding-rez.

Would it be more accurate to say that it supports packages for "Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN", but not e.g. Go or D or the remaining 97% of langauges? Rez could handle those, because it's agnostic. But it still looks as though Conda needs explicit support for specific languages?

Conda appears to couple installation with the environment but in fact uses dynamically managed hardlinks whenever possible

This was the part that bit me the most when using Conda. Are you saying you can share packages across environments? To e.g. have a central repository of packages like Rez does, and spin up one environment with python-3 and six and another using with python-2 and six where six is reused?

Or do you mean it'll physically copy files into each environment (with or without hardlinks)? That's the part that threw me off completely. How can that work with large environments and large packages e.g. Maya along with Arnold and USD and 156 additional packages? Rez wouldn't need to reinstall or copy anything, which is what makes packages sharable across environments.

Doesn't seem what Conda is intended for or practical with, which is why it appears more closely related to virtualenv. Wouldn't you agree?

darkvertex commented 3 years ago

Hey @darkvertex, been a while, hope all is well. :)

Hey! Wasn't sure if you'd remember me. :) All is good here. Hope yourself and your loved ones are well also!

Would it be more accurate to say that it supports packages for "Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN", but not e.g. Go or D or the remaining 97% of langauges? Rez could handle those, because it's agnostic. But it still looks as though Conda needs explicit support for specific languages?

Conda is a generic software packager and environment manager that supports supports Linux + Windows + Mac, so to me I wouldn't say it's specific to any one language. Conda itself (as in "the conda tool") is written in mostly Python but it's for packaging stuff from any language.

While "building" a conda package can involve a compilation step, the resulting package is usually a specific set of artifacts (compiled executables, resource files, environment customizations and whatnot) of the build. This is then packaged for a specific platform (if it's say executables that can only run on a certain OS) or not, if it's something more generic like Python content.

In a way, it's a bit similar to how Python wheels work, in that if you have OS-specific things to do, you prepare them for each OS you wish to support and provide a wheel per variant. Python wheels do not self-compile; they are taken as-is. Conda packages are the same. You can make a fancy script to compile the package when building it and depending how you configured it, it may declare itself as agnostic or as having OS or CPU architecture specific things. So if you were to say package a complex compile like Qt, you might have a multiplatform build script that crosscompiles and produces, per Qt version, a conda pkg for Windows, one for OSX, and a static compiled one for 64bit Linux, for example.

Much like Python wheels, which can be configured to be "universal" (ie it works anywhere) vs OS / CPU architecture / Python version specific, conda packages can also be designated as being generic enough to work anywhere or OS or CPU architecture specific.

Conda appears to couple installation with the environment but in fact uses dynamically managed hardlinks whenever possible

This was the part that bit me the most when using Conda. Are you saying you can share packages across environments? To e.g. have a central repository of packages like Rez does, and spin up one environment with python-3 and six and another using with python-2 and six where six is reused?

In a Conda environment, after stuff installs, it's all local to the machine. What I'm saying is you can share packages across your conda environments and the data will only exist once on disk per package version. This is different from virtualenv, which will explicitly copy stuff to the site-packages of each venv, duplicating the data each time.

Or do you mean it'll physically copy files into each environment (with or without hardlinks)? That's the part that threw me off completely. How can that work with large environments and large packages e.g. Maya along with Arnold and USD and 156 additional packages? Rez wouldn't need to reinstall or copy anything, which is what makes packages sharable across environments.

Doesn't seem what Conda is intended for or practical with, which is why it appears more closely related to virtualenv. Wouldn't you agree?

Conda is not designed to source stuff directly off of the network file share (except for optionally sourcing packages.) I consider this a bonus but some may see it differently and that's fine. When I was at my previous gig we used Rez and I had to prep a bunch of dummy packages for Maya and Nuke and Houdini that would deploy the software locally to the machine via rsync, and the rest of the Rez environment would be read straight off the network. It was the only way to have decent boot times.

With Conda this is not an issue as it is designed to locally-install any packages you use in your conda environments. My point about hardlinks was that Conda keeps a sort of "install cache" internal folder local to the machine and hardlinks from it to any environment that needs files for a specific version of a given package.

What this means in practice is that if say you had made a "maya-2020" conda package in environment A, it got sourced and installed, then you made environment B and added it and some other packages, the second install of the maya-2020 pkg is relatively instant as it just hardlinks from the "install cache" since it was already deployed in that system before. What I'm saying is there is only 1 copy of Maya-2020 in the system, and environments A and B hardlink the same content on both, thus saving space across (local) environments.

Of course this behaviour has its downsides/inconveniences too, as you (or your conda-powered custom in-studio launcher app) are responsible for the environments and cleaning caches (see conda clean --help.) Another pro of Rez over Conda is how Rez can (out-of-the-box) achieve a disposable temporary environment that lives for the duration of the shell (because you can't "save" a rez-env, it's just a long commandline.) With Conda you have to formally create a named environment and then you can ask it to have package X, Y and Z.

By the way, does your bleeding-rez variant formalize a way of local-deploying files of a Rez pkg for specific packages only? I didn't look very deeply but if it can do this out of the box, that's awesome.

Oh and another thing... I used Qt as an example earlier for a conda package, but actually you wouldn't have to prepare one because one already exists that is well maintained across OSs.

Legacy Rez --- (and maybe bleeding-rez? not sure) -- insists on having a network share where packages live and are sourced from at all times. The path must always exist. Considering the WFH world we now live in, I would love to see Rez's package sources mechanisms abstracted to work with some kind of "filesystem plugin system" so that we could explore stuff like sourcing packages off of HTTP, blob storage (Amazon S3 / Google Cloud Storage), etc.

This is actually to me my biggest point of contention with Rez vs Conda, and the biggest pro for Conda, is that they have 12,563 opensource packages available on their third-party opensource community-driven conda package repository (known as a "feedstock" in Conda terminology) called conda-forge. This means that out of the box Conda can install a ton of useful things.

I would looooooove for Rez (or bleeding-rez ;)) to formalize a way of exposing a thirdparty package repository over HTTPS so you could just install packages from the internet as easily as people can distribute stuff in PyPi to pip-install straight, or how you can pip-install directly from a git repo url that has a pip-friendly setup.py file in it.

At my previous gig we had forked Rez internally and someone in my old team had implemented a dynamic virtualenv manager so that when we ran rez-env we could not just pass along rez package names but pip package names. It complicated things a bit, but it was immensely useful because it meant you could easily autodeploy anything from pypi that already existed in the opensource universe without having to repackage it as Rez packages (that we would then forget to keep up with / automate the deployment of.) -- There was a rez-pip command in legacy Rez that would convert a pip package to a rez one, but it had a huge downside at the time that it didn't track dependencies so you had to go and generate rez packages manually for all dependencies and it got really tiresome. (Was that ever fixed in legacy or bleeding-rez?)

ps: Sorry for the crazy-long response. lol

mottosso commented 3 years ago

What I'm saying is you can share packages across your conda environments and the data will only exist once on disk per package version.

Ah, I see. That must have been where I got things wrong. I could have sworn it created unique (and huuge) environment each time I set one up, that was my main reason for never looking in that direction again.

the second install of the maya-2020 pkg is relatively instant as it just hardlinks from the "install cache" since it was already deployed in that system before

Ah ok hold on now, so it does produce a copy of the environment?

I'm not particularly interested in whether the copy are real or hard links; that's a looooot of files to copy. And when are they deleted? Also, what's "relatively"? Hardlinking 1,000 files (from e.g. a NumPy install) is still several seconds of 100% locking up your drive and setting off your Windows Defender to scan through each one, another 100% CPU. For every package. :OO I don't see how that is practical.

see conda clean --help

Yikes.

By the way, does your bleeding-rez variant formalize a way of local-deploying files of a Rez pkg for specific packages only? I didn't look very deeply but if it can do this out of the box, that's awesome.

bleeding-rez dosn't, but the rez-localz package does. Packages are self-contained, so it's literrally a copy of one folder to another. What that package does is just copy the packages Rez resolves for you.

Legacy Rez --- (and maybe bleeding-rez? not sure) -- insists on having a network share where packages live and are sourced from at all times. The path must always exist.

No that's not true. Rez, like Python's PYTHONPATH, searches for a package in a list of paths until it finds it. If a package is usually on \\network\package but that path isn't available, it'll search the next path, such as a local cache c:\mybackup\package. Both bleeding and vanillay Rez does that. rez-localz builds on that idea. Copying things from one of those locations onto your local drive.

I would looooooove for Rez (or bleeding-rez ;)) to formalize a way of exposing a thirdparty package repository over HTTPS so you could just install packages from the internet as easily as people can distribute stuff in PyPi to pip-install straight, or how you can pip-install directly from a git repo url that has a pip-friendly setup.py file in it.

I actually started having a look at this, there's a lot of promise there, and much low hanging fruit. At the moment, there are two steps, (1) clone a prepared package off of GitHub/GitLab and (2) call build install on it. Those two could be combined, along with just documenting that "hey, you should upload to GitHub". Then we could just put all of them in one big list, like Scoop and even Conda, and there's the centralised package manager. :D

ps: Sorry for the crazy-long response. lol

Hah, no problem. :) I enjoy thinking about this problem. In any case, you've got the better understanding of Conda. If you type something up, I'd be happy to replace my definition of it in the README.