mottosso / bleeding-rez

Rez - Reproducible software environments for Windows, Linux and MacOS
GNU Lesser General Public License v3.0
71 stars 10 forks source link

Alternative to Filesystem Package Repository #99

Open mstreatfield opened 3 years ago

mstreatfield commented 3 years ago

Hi There,

I'm getting back into rez after a few years away, and have been mulling over how packages are managed.

As I recall, rez (and bleeding-rez) require a large blob of storage to resolve the environment and then give access to artefacts (binaries) for the resolved packages. The filesystem acts as both a database of available packages and an artefact repository. To achieve this over multiple sites, the storage has to be available over the WAN or replicated across data-centres, and has to be POSIX compliant.

I'm curious whether there is scope to remove the dependency on the filesystem, and open up rez for working with other artefact repositories such as Artefactory or Cloudsmith.

This would allow package metadata to be kept in a database (or perhaps still a filesystem) and package artefacts to be kept in a hosted repository. It removes the dependency on the filesystem, opening up some more cloud-friendly ways of working. Rez has an open pull request to add support for a MongoDB repository which, although out-of-date against the current master, gives an idea of what part of this might look like.

Before I push this further, I wanted to see if any consideration had been given to this way of working, and whether it might be useful.

Mark.

p.s. I appreciate that this is not an issue as such; I could not find a more suitable place to kick off the conversation.

mottosso commented 3 years ago

Hi Mark,

I think what you're looking for is both a hot topic amongst Rez'ers and also something some people have gone ahead and implemented. I can't remember the details, but @instinct-vfx might know more, he's solving this problem at scale.

mstreatfield commented 3 years ago

Ah ok. Yes, it would be great to hear what others are up to that overlaps with this. Is there a mailing list/gitter/etc for bleeding-rez, or are we ok to have the conversation here?

mottosso commented 3 years ago

I'm not in the loop with Rez these days, but there was a Slack with the major users here: https://rez-talk.slack.com/ I'm actually not sure bleeding-rez is relevant anymore, it's quite possible nerdvegas/rez has caught up with Python 3 and Windows support by now, in which case it would make the most sense to push for this there. Alternatively, if it's too avant garde for nerdvegas/rez, you'd be welcome to take the helm in this direction for bleeding-rez. @davidlatwe is more in the loop and might know more.

davidlatwe commented 3 years ago

Hi :)

I did make a MongoDB based repository for storing Allzpark profile, here's the source, but since we don't need variant for profile so that repository plugin doesn't really save out variants.

And @mstreatfield if you are looking for Slack chat room invitation link, I can provide one for you.

mstreatfield commented 3 years ago

Thank you for the info and updates.

Yes, please, an invite might be handy. We're evaluating our options at the moment and so it would be useful to get back up-to-speed with the latest and greatest.

Would Slack be the right place to ask about the Python 3/Windows support - sounds like it?

davidlatwe commented 3 years ago

Okay, invitation has sent to your email :)

Would Slack be the right place to ask about the Python 3/Windows support

Yep, I have seen people talking about that in past couple months.

mstreatfield commented 3 years ago

Received, thank you. I'll pick up the conversation there when I've had a chance to catch up on the threads that already exist.

instinct-vfx commented 3 years ago

I guess the main question here is what do you want to achieve by putting package descriptions into a database. People have mentioned doing like David did and implement Mongo and similar plugins, but it did not really help performance. If you run Rez with memcached it is usually quite quick (it caches package.py files, repositories and resolves).

On the other hand having packages as completely self contained "blobs" is a design decision that gives a lot of benefits. We are shipping packages around the globe. And not having to manage central database structures makes that a whole lot easier. I just pull the packages, put them in a zip and ship them.

There have been discussions for something more "appstore" like. But it would not really be the repository, but a source to pull packages from (or recipes to build them).

mstreatfield commented 3 years ago

It's not about performance; I've seen the difference memcached can make and am not concerned that we'll see a problem here. And I'm less interested in putting package descriptions in a database, that might just be a side-effect.

To resolve an environment rez needs to know about all of the software that has been released, and so to do this in a multi-site environment all packages must be everywhere - I'm not sure that is desirable. I'm curious if it's possible to resolve an environment based on the entire catalog and then localise any missing artefacts on demand.

At the same time, I'm curious whether there is scope for something like Artefactory or Cloudsmith to manage the packages that rez provides access to and take advantage of some of the other features that those services offer.

mottosso commented 3 years ago

To resolve an environment rez needs to know about all of the software that has been released, and so to do this in a multi-site environment all packages must be everywhere

What if each site shared their package repository over e.g. samba? I would expect the first query to pay the price of latency/throughput but for subsequent queries to happen over memcached and be as fast as though all of it was local. And then you could use something like rez-localz to speed up only the packages that are actually used.

instinct-vfx commented 3 years ago

It's not about performance; I've seen the difference memcached can make and am not concerned that we'll see a problem here. And I'm less interested in putting package descriptions in a database, that might just be a side-effect.

To resolve an environment rez needs to know about all of the software that has been released, and so to do this in a multi-site environment all packages must be everywhere - I'm not sure that is desirable. I'm curious if it's possible to resolve an environment based on the entire catalog and then localise any missing artefacts on demand.

At the same time, I'm curious whether there is scope for something like Artefactory or Cloudsmith to manage the packages that rez provides access to and take advantage of some of the other features that those services offer.

I see. So if you can offer a file based shared location then local caching is available through the built in payload caching (see https://github.com/nerdvegas/rez/wiki/Package-Caching). I can see how providing a shared file based location might be an issue so having something like Artifactory or S3 support might be beneficial. I would be specifically interested in having a special repository type that stores package payloads zipped and enforces local caching and adds support for zipped payloads to rez-package-cache.

mstreatfield commented 3 years ago

What if each site shared their package repository over e.g. samba?

I considered this; ultimately my assumption was that NFS/SMB over the WAN was probably not going to be robust or secure enough to be feasible.

I'd seen rez-localz and hoped that might provide some help in this situation, and rez-package-cache looks like it might be applicable, too.

I would be specifically interested in having a special repository type that stores package payloads zipped and enforces local caching and adds support for zipped payloads to rez-package-cache.

In this description, the package metadata (package.py) and payload are still part of the same repository, or separate? In my mind, decoupling them is required for some of this flexibility.