Rez comparison - Githubissues

themill / wiz

Environment management framework

GNU Lesser General Public License v3.0

45 stars 4 forks source link

Rez comparison #25

Open mottosso opened 4 years ago

mottosso commented 4 years ago

Hola! Was just tipped about this project, looks interesting!

Would it be possible to write a few lines about whether someone familiar with Rez should consider Wiz, and how it differs? I noticed reference to part of it in #19.

buddly27 commented 4 years ago

Yes we need to add a comparison section in the docs indeed!

In short, some of the main differences are:

Rez manages data + environment while Wiz only manages the environment, so Wiz does not enforce a specific data structure. Wiz package definitions live in registries which do not contain the data. We have built other tools to manage data and create Wiz definitions automatically too. That means that it is possible to integrate Wiz over Pip, Conda, Puppet or other in-house solutions. We are planning to open source the Python installer soon.
Wiz package definitions file format is serializable which enhances portability. It also ensures that we are not necessarily bound to Python as we might move other languages at some point (e.g. Rust).
Wiz ensure deterministic results as functions/callback cannot be set in the definition.
Rez resolution algorithm requires a dependency tree without circular dependencies while Wiz can handle circular dependencies. The resolution algorithm is based on the Dijkstra algorithm and is not bound to directed acyclic graphs. This should also be properly documented.

There is also a lot in common:

The core concept is similar (packages are not installed in containers).
The way of passing arguments to the command line tool is similar.
Wiz definitions can also define variants.
Both allows version dropdown Etc..

I’ll be looking in a more thorough comparison and update you :)

Are there any specific features of Rez you would like to see a comparison of?

mottosso commented 4 years ago

This is great, thank you.

Let me have a think while I use Rez to see which parts I'm most interested in. Most pressingly is the authoring of new packages, which is such a pain with Rez. Partly because of the separation between what you build and what you publish; two hierarchies I need to keep track of. I understand why, and don't really see a way around it though. And partly because the syntax is Python but so alien (everything being hidden globals) that I keep forgetting how to use it, so JSON should help there. Preferably I'd be able to quickly author small projects from the command-line.

Speaking of JSON, I know Rez started with JSON as well, but eventually transitioned into Python. I can't pinpoint exactly why, but that could be something interesting to look into and see whether the problems solved by that decision are problems you are having or will be having as well. Sorry it's a bit rambly, too many things going on. xD But great summary!

buddly27 commented 4 years ago

Most pressingly is the authoring of new packages, which is such a pain with Rez. Partly because of the separation between what you build and what you publish; two hierarchies I need to keep track of.

Can you elaborate on that? Is that because you need to duplicate information about the package (e.g. requirements, description, etc.) between the package.py file for Rez and the setup.py file for Python? Or are you talking about the discrepancy between building the package and resolving the package?

I actually don't have a lot of experience with Rez beside the investigation we conducted a few years ago, so this is great information!

Speaking of JSON, I know Rez started with JSON as well, but eventually transitioned into Python. I can't pinpoint exactly why, but that could be something interesting to look into and see whether the problems solved by that decision are problems you are having or will be having as well

My gut feeling is that using a Python file is handy to use complex logic when installing a package:

def commands():
    if something:
        env.PYTHONPATH.append("foo")
    elif something_else:
        env.PYTHONPATH.append("bim")
    ...

This could backfire as people would tend to write unnecessary complex logic which can break the deterministic nature of the tool. But we might be missing some crucial point indeed, would be interesting to know more about this.

mottosso commented 4 years ago

Can you elaborate on that?

Sure, let me preface by saying that my use of Rez is not like most; I use it for development environments and project management. That is, managing the environment for projects (like shot length), in addition to what software goes into each project (like Maya). Most only use it for the latter.

Here's what I'm talking about.

I just installed Visual Studio 2019, and need some way of adding this to a Rez environment
So I make a new folder with a package.py inside, a "package"
I then build and publish this package, all done. 4 Now I can rez env vs-2019 otherLibrary-1.
Profit

Here's what they look like on disk.

/source/vs-2019/package.py
~/packages/vs/2019/package.py

The latter being "built" from the former.

Sometimes, to "build" means to compile, which means the former is source code and the latter binary, in which case this separation makes sense. I keep the former under source control, and the latter inside of Rez's registry. But a lot of times, my source and build hierarchies are identical, like in the case above where the package consists solely of environment variables (e.g. putting cl.exe on PATH).

Half of my use of Rez is for just that; environment management. Maybe more. And for that, this separation is major PITA.

I've been toying with some ideas to simplify that, like..

https://github.com/mottosso/bleeding-rez/issues/63

And..

https://github.com/mottosso/bleeding-rez/issues/72

But at the end of the day, hand-written shell scripts (e.g. PowerShell) are quicker to write and easier to remember. Which is a bummer, because they are also a pain.

Out of interest, should Wiz handle cases like these? And if so, what would that look like?

I actually have a lot more on this from a recent venture.

https://github.com/mottosso/rez-for-projects

That ultimately led to a GUI on top of Rez.

https://allzpark.com

For which I later encountered another contender to Rez, called Spack which has some interesting (and likely familiar) ideas if you haven't had a look already. Not to mention Nix. That should keep you busy for a bit. :)

And also reminds me, apart from Rez, are there any other inspirations for Wiz?

I actually don't have a lot of experience with Rez beside the investigation we conducted a few years ago, so this is great information!

That actually brings me to another question; why Wiz? What made you start a new project, over using one that already exists? It's not exactly a crowded space, so alternatives are good. And from experience, the vast majority of studios also implement their own solution to this problem, despite having access to a free Rez. The difference here is you chose to open source it, which is great.

buddly27 commented 4 years ago

Sometimes, to "build" means to compile, which means the former is source code and the latter binary, in which case this separation makes sense. I keep the former under source control, and the latter inside of Rez's registry. But a lot of times, my source and build hierarchies are identical, like in the case above where the package consists solely of environment variables (e.g. putting cl.exe on PATH).

Yeah this is one of the reasons we didn't want to deal with the building process as it forces you to handle every possible building strategy. In your case you might find it easier to install VS manually and create a definition to use it within a custom environment. Or if you are deploying it for multiple users, it might be worth considering a data management system like puppet or conda and write a tool which automatically creates the definition when you install it or links the definition with existing VS install (Gitlab-CI, build script, etc).

For instance, the Python installer we wrote is a lightweight extension of Pip which installs each package to a special folder structure instead of targeting a common site-packages and automatically extracts a definition by reading the setup.py. We can also embed a custom definition in the repo to add custom elements that Python cannot handle (e.g. dependency to a non-Python library) which is then expanded on with data from the setup.py. The same logic can be applied to Conda for non-Python packages, but that tool is still on our TODO list.

Out of interest, should Wiz handle cases like these? And if so, what would that look like?

We do have our development setup use Wiz as well. As a very simple example, this is how a development registry can look like:

shot_environment.json -> env variables for shot length etc
maya.json -> maya definition (licence, bin, ld libary path, etc)
my-tool.json -> pythonpath to package -e install

We would then run:

wiz -add /path/to/registry use shot-environment "maya==2018.*" my-tool -- maya

(We can give you a more specific example if you want more details)

So yes, we are using Wiz for job setups and development for similar reasons you seem to be using Rez for.

And also reminds me, apart from Rez, are there any other inspirations for Wiz?

Yes, ION and Anaconda, but we didn't like the idea of using containers as we wanted to avoid duplicating data.

That actually brings me to another question; why Wiz?

While we absolutely needed a way to manage environments, we had some good processes in place for building packages, and migrating everything under Rez would have been painful. We wanted the benefit of working with Rez (no containers) while being able to be able to leverage robust data management systems (pip, conda, puppet, etc) and have the flexibility of using different data management depending on the context (development and job setups).

And from experience, the vast majority of studios also implement their own solution to this problem

This is exactly where Wiz comes from :) But since we have it running in production, we figured that there might be interest outside of the Mill, which could help us improve it. I think there is a real benefit in seeing the environment management step as a separated layer that be built upon, hopefully sharing the code can help finding holistic solution around these issues :)

Thanks a lot for all the links!

nerdvegas commented 4 years ago

Yes we need to add a comparison section in the docs indeed!

In short, some of the main differences are:

Rez manages data + environment while Wiz only manages the environment, so Wiz does not enforce a specific data structure. Wiz package definitions live in registries which do not contain the data. We have built other tools to manage data and create Wiz definitions automatically too. That means that it is possible to integrate Wiz over Pip, Conda, Puppet or other in-house solutions. We are planning to open source the Python installer soon.

Wiz package definitions file format is serializable which enhances portability. It also ensures that we are not necessarily bound to Python as we might move other languages at some point (e.g. Rust).

Wiz ensure deterministic results as functions/callback cannot be set in the definition.

Rez resolution algorithm requires a dependency tree without circular dependencies while Wiz can handle circular dependencies. The resolution algorithm is based on the Dijkstra algorithm and is not bound to directed acyclic graphs. This should also be properly documented.

There is also a lot in common:

The core concept is similar (packages are not installed in containers).

The way of passing arguments to the command line tool is similar.

Wiz definitions can also define variants.

Both allows version dropdown Etc..

I’ll be looking in a more thorough comparison and update you :)

Are there any specific features of Rez you would like to see a comparison of?

Just to clarify, rez doesn't enforce management of the package data. A rez package definition can refer to a package payload in any location, or not at all. The rez-build tool does install into the package repo alongside the package definition, but that is a matter of convenience - you can build something else on top of the rez API if you want to do something different.

# in package.py
name = "my_pkg"
version = "1.0.0"

def commands():
    # something else has installed the package here
    env.PYTHONPATH.append("/some/other/place")

# don't actually build anything on rez-build/rez-release, just install the pkg defn
build_command = None

One current limitation is the inability to define per-variant attributes beyond the requirements list (eg, an explicit install path). That's something I'd like to address.

Some other points:

rez is deterministic unless a package author deliberately does something non-deterministic in their commands function. In practice I've never seen this (beyond obvious cases such as appending some user-specific plugin path, for eg)
rez package definition is serializable, it has to be because definitions are stored into its memcached server. Whilst its definition language is python, that is unrelated to the types of packages you might manage with rez - in fact in the very early days, rez was used primarily to manage C++ packages. Python was chosen primarily because of the kinds of things we needed to do in a package's commands function beyond setting env-vars (such as sourcing scripts, defining aliases and so on). We required a non-shell-specific language to define this, and python fit the bill. In rez v1, package definitions were YAML, and their commands were a list of bash commands.

At its core, rez is a bunch of package definitions in various repos, and a dependency resolver. Once resolved, each package then configures the resulting env, typically by setting/appending env vars. Is this not the same as Wiz? Everything else rez has (rez-build/release and associated tooling for the most part) is optional extra.

I'm interested in the resolver side of thing also. Does Wiz give the same guarantees that rez does? Specifically, you're guaranteed to get the latest possible version of each package in the request, in requested order priority. I tried other resolver implementations in the past (specifically using boolean satisfiability - eg https://github.com/niklasso/minisat). Whilst good at finding all possible solves (of which there can be millions), it was not very good at finding the one you want - ie, the one with the highest package versions, in some deterministic way (and rez provides this determinism as mentioned - priority based on order of package request).

Thx A

mottosso commented 4 years ago

dependency resolver

I'm curious about this too. I very much underestimated the importance and complexity of this before taking Rez for a spin. On the other hand, although technically deterministic, it still surprises you on occasion because of how deep the dependency can get. Sometimes, you know that 4th level dependency incompatibility won't become a problem in a specific circumstance and then you're left with unrolling the spaghetti.

For less-automated, hands-on, ad-hoc setups (like perhaps tens/hundreds of ad-projects), I can imagine a less complex solver being suitable so long as it's predictable. E.g. let the user ask for both Maya 2019 and 2020 and merge the results, or let indirect dependencies be incompatible with each other and transfer some of the burden to the developer/user in exchange for a more predictable resolver and iterative workflow (e.g. "I'll fix that warning later).

buddly27 commented 4 years ago

Thanks for your inputs Alan!

I'm interested in the resolver side of thing also. Does Wiz give the same guarantees that rez does? Specifically, you're guaranteed to get the latest possible version of each package in the request, in requested order priority

It does. We will document the algorithm in more details (#33), but here is how it works:

A graph is created from initial package requests with all dependencies, including all versions and variants of each package. A weight corresponding to the order of the request is assigned to each node. A weight is also assigned to each dependent node following the requirement order.
A graph "combination" is generated with only one variant of each package. Variants included in the first combination are from the highest possible version, or they are following the oder defined in a definition.
Shortest path algorithm (Dijkstra) is applied to graph combination to determine package order using cumulated node weights. Conflicting nodes are treated in ascending order of distance from the root level of the graph to ensure a that we don't waste time on conflicts that belong to a branch of the graph that will be pruned (breadth-first method). New packages can be added to the graph during the conflict resolution stage if necessary. If the new package bring several variants, the current combination is abandoned and a new one is created following the same strategy we defined in (2).
When we reach a combination which only contains one version of each nodes, we sort the packages and extract the corresponding context.

You can find some examples on steps 1-2-3 in the benchmark tests.

I tried other resolver implementations in the past (specifically using boolean satisfiability - eg https://github.com/niklasso/minisat). Whilst good at finding all possible solves (of which there can be millions), it was not very good at finding the one you want - ie, the one with the highest package versions, in some deterministic way (and rez provides this determinism as mentioned - priority based on order of package request).

Thanks for the link, there are a few package resolvers out there which uses SAT solver so this is definitely worth exploring further:

I also started to have a look at Pip's new dependency resolver as the problem they are trying to solve is very similar to ours.

I'm curious about this too. I very much underestimated the importance and complexity of this before taking Rez for a spin.

This is a very hard problem indeed! :)

buddly27 commented 4 years ago

Regarding your other points:

rez is deterministic unless a package author deliberately does something non-deterministic in their commands function. In practice I've never seen this (beyond obvious cases such as appending some user-specific plugin path, for eg)

This is precisely what we wanted to prevent though. it is sometimes tempting to solve a problem with a simple command, but this could end up being a nightmare to debug. (e.g. this package)

rez package definition is serializable, it has to be because definitions are stored into its memcached server

Ok I see what you mean, is that the serialization logic? https://github.com/nerdvegas/rez/blob/663efc277924bdb353c85869585132f4191b703e/src/rez/serialise.py#L284

It seems quite harder than to do than using a data serialization format though. What was the blocker with YAML? The commands I suppose?

nerdvegas commented 4 years ago

Hey Jeremy,

Thanks for the info!

RE serialization:

That code link you gave is something a little different - this is for processing packages that haven't been built yet, there are constructs (such as 'early' decorator) that don't exist in installed packages, because they only make sense pre-build. Basically there is not a serialise format per se

the package.py contents is the format, and the API is used to get a package definition to/from python source. Wrt YAML, it actually is still supported but some newer features aren't, I've opted to deprecate it as there's no pressing need and it's just more code to maintain - everyone uses package.py.

RE solver:

That sounds really interesting. The biggest issue I ran into with SAT was the inability to apply weights to the solutions, hence needing to search the entire solution space to find the right solve. One question though, specifically about -

"""A graph is created from initial package requests with all dependencies, including all versions https://wiz.readthedocs.io/en/stable/definition.html#version and variants https://wiz.readthedocs.io/en/stable/definition.html#variants of each package"""

We (Method) have enough packages that this could easily pull in 1000's or 10,000s of packages, which would mean constructing a pretty hefty initial graph. Have you run into issues with how long it takes to process this phase?

Cheers Allan

On Wed, Sep 2, 2020 at 4:56 AM Jeremy Retailleau notifications@github.com wrote:

Regarding your other points:

rez is deterministic unless a package author deliberately does something non-deterministic in their commands function. In practice I've never seen this (beyond obvious cases such as appending some user-specific plugin path, for eg)

This is precisely what we wanted to prevent though. it is sometimes tempting to solve a problem with a simple command, but this could end up being a nightmare to debug. (e.g. this package https://github.com/predat/rez-packages/blob/ae72b8619b519ff7ae026397ab68e9892a92f441/softs/houdini/16.5.439/package.py )

rez package definition is serializable, it has to be because definitions are stored into its memcached server

Ok I see what you mean, is that the serialization logic?

https://github.com/nerdvegas/rez/blob/663efc277924bdb353c85869585132f4191b703e/src/rez/serialise.py#L284

It seems quite harder than to do than using a data serialization format https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats though. What was the blocker with YAML? The commands I suppose?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/themill/wiz/issues/25#issuecomment-685067927, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMOUSTHBMI7ARA3QKPFBMTSDU7U3ANCNFSM4PXM255A .

buddly27 commented 4 years ago

We (Method) have enough packages that this could easily pull in 1000's or 10,000s of packages, which would mean constructing a pretty hefty initial graph. Have you run into issues with how long it takes to process this phase?

We did run into performance issues during the definition discovery phase that were mostly solved in v3.1.0.

The benchmark give satisfying results for up to 4500 definitions:

> pytest ./test/benchmark/test_definitions_discover.py
------------------------------------------------------------------------------------------------- benchmark: 5 tests ------------------------------------------------------------------------------------------------
Name (time in ms)                                    Min                 Max                Mean             StdDev              Median                IQR            Outliers      OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_discover_1500_definitions                   73.5521 (1.0)       91.4437 (1.0)       77.4378 (1.0)       6.3268 (1.0)       74.2542 (1.0)       2.9440 (3.51)          2;2  12.9136 (1.0)          11           1
test_discover_3000_definitions                  151.4398 (2.06)     188.0195 (2.06)     158.2435 (2.04)     14.5964 (2.31)     152.4630 (2.05)      0.8385 (1.0)           1;1   6.3194 (0.49)          6           1
test_discover_4500_definitions_linux_only       228.8429 (3.11)     283.2860 (3.10)     242.3035 (3.13)     23.0151 (3.64)     232.2764 (3.13)     15.8498 (18.90)         1;1   4.1271 (0.32)          5           1
test_discover_4500_definitions                  230.3487 (3.13)     253.9806 (2.78)     236.5502 (3.05)      9.8483 (1.56)     233.2503 (3.14)      7.8511 (9.36)          1;1   4.2274 (0.33)          5           1
test_discover_4500_definitions_windows_only     239.0745 (3.25)     370.6112 (4.05)     292.9962 (3.78)     48.4912 (7.66)     281.2205 (3.79)     50.1262 (59.78)         2;0   3.4130 (0.26)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Tweaking the test to load 10.000 definitions is still under a second:

------------------------------------------------------------------------------------------------- benchmark: 5 tests ------------------------------------------------------------------------------------------------
Name (time in ms)                                     Min                 Max                Mean             StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_discover_3500_definitions                   157.8479 (1.0)      177.3744 (1.0)      163.5049 (1.0)       7.1500 (1.0)      161.2196 (1.0)       4.7130 (1.0)           1;1  6.1160 (1.0)           6           1
test_discover_7000_definitions                   331.6141 (2.10)     369.6768 (2.08)     344.8822 (2.11)     17.0816 (2.39)     334.5079 (2.07)     26.8802 (5.70)          1;0  2.8995 (0.47)          5           1
test_discover_10500_definitions_windows_only     494.6193 (3.13)     543.4300 (3.06)     520.6159 (3.18)     19.7367 (2.76)     519.6463 (3.22)     32.1102 (6.81)          2;0  1.9208 (0.31)          5           1
test_discover_10500_definitions                  497.8384 (3.15)     571.8083 (3.22)     533.4982 (3.26)     26.8244 (3.75)     537.1576 (3.33)     29.3926 (6.24)          2;0  1.8744 (0.31)          5           1
test_discover_10500_definitions_linux_only       498.1399 (3.16)     540.1189 (3.05)     524.9916 (3.21)     17.2282 (2.41)     532.1038 (3.30)     24.5732 (5.21)          1;0  1.9048 (0.31)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

We didn't implement any serious caching logic yet so there is still room for improvement!

nerdvegas commented 4 years ago

Hey Jeremy,

I think I'll have to wait for that algorithm description as there's definitely concepts I don't follow yet. For example, you've said you build a graph connecting all packages/variants with their requirements, but even for a modest resolve, that would be a million+ edges. Ie, I'm assuming that if foo-1.2.3 requires bah>1.2, that might be 20 outgoing edges (if there are 20 bah versions > 1.2). Also to your point in (2) (A graph "combination" is generated with only one variant of each package), clearly any combination of variants from packages could be the correct resolve, so I don't yet know what happens if these latest variants conflict.

In any case, I look forward to finding out more. If the functionality here is equivalent to what rez is doing, it could make sense to port it. Have you considered separating the solver out into its own project? General dependency resolvers in python aren't much of a thing and I'm sure there would be applications for it outside of package management.

Thanks A

On Wed, Sep 2, 2020 at 12:09 PM Jeremy Retailleau notifications@github.com wrote:

We (Method) have enough packages that this could easily pull in 1000's or 10,000s of packages, which would mean constructing a pretty hefty initial graph. Have you run into issues with how long it takes to process this phase?

We did run into performance issues during the definition discovery phase that were mostly solved in v3.1.0 https://wiz.readthedocs.io/en/stable/release/release_notes.html#release-3.1.0 .

The benchmark give satisfying results for up to 4500 definitions:

pytest ./test/benchmark/test_definitions_discover.py ------------------------------------------------------------------------------------------------- benchmark: 5 tests ------------------------------------------------------------------------------------------------ Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations

test_discover_1500_definitions 73.5521 (1.0) 91.4437 (1.0) 77.4378 (1.0) 6.3268 (1.0) 74.2542 (1.0) 2.9440 (3.51) 2;2 12.9136 (1.0) 11 1 test_discover_3000_definitions 151.4398 (2.06) 188.0195 (2.06) 158.2435 (2.04) 14.5964 (2.31) 152.4630 (2.05) 0.8385 (1.0) 1;1 6.3194 (0.49) 6 1 test_discover_4500_definitions_linux_only 228.8429 (3.11) 283.2860 (3.10) 242.3035 (3.13) 23.0151 (3.64) 232.2764 (3.13) 15.8498 (18.90) 1;1 4.1271 (0.32) 5 1 test_discover_4500_definitions 230.3487 (3.13) 253.9806 (2.78) 236.5502 (3.05) 9.8483 (1.56) 233.2503 (3.14) 7.8511 (9.36) 1;1 4.2274 (0.33) 5 1 test_discover_4500_definitions_windows_only 239.0745 (3.25) 370.6112 (4.05) 292.9962 (3.78) 48.4912 (7.66) 281.2205 (3.79) 50.1262 (59.78) 2;0 3.4130 (0.26) 5 1

Tweaking the test to load 10.000 definitions is still under a second:

------------------------------------------------------------------------------------------------- benchmark: 5 tests ------------------------------------------------------------------------------------------------ Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations

test_discover_3500_definitions 157.8479 (1.0) 177.3744 (1.0) 163.5049 (1.0) 7.1500 (1.0) 161.2196 (1.0) 4.7130 (1.0) 1;1 6.1160 (1.0) 6 1 test_discover_7000_definitions 331.6141 (2.10) 369.6768 (2.08) 344.8822 (2.11) 17.0816 (2.39) 334.5079 (2.07) 26.8802 (5.70) 1;0 2.8995 (0.47) 5 1 test_discover_10500_definitions_windows_only 494.6193 (3.13) 543.4300 (3.06) 520.6159 (3.18) 19.7367 (2.76) 519.6463 (3.22) 32.1102 (6.81) 2;0 1.9208 (0.31) 5 1 test_discover_10500_definitions 497.8384 (3.15) 571.8083 (3.22) 533.4982 (3.26) 26.8244 (3.75) 537.1576 (3.33) 29.3926 (6.24) 2;0 1.8744 (0.31) 5 1 test_discover_10500_definitions_linux_only 498.1399 (3.16) 540.1189 (3.05) 524.9916 (3.21) 17.2282 (2.41) 532.1038 (3.30) 24.5732 (5.21) 1;0 1.9048 (0.31) 5 1

We didn't implement any serious caching logic yet so there is still room for improvement!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/themill/wiz/issues/25#issuecomment-685240817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMOUSRJWZGWP5JTEWWQGT3SDWSOTANCNFSM4PXM255A .

buddly27 commented 4 years ago

Sorry I misunderstood your question, the loading of 10.000 definitions from the registries takes around half a second but not if you include all these nodes in the graph! The graph is only built from the initial package requests dependency tree. Actually I'm not sure we ever had any requests which ended up building a graph with more than 500 nodes, but this would be an interesting metric to track.

I pushed a little benchmark on the dev branch to see how it would behave with 100, 1000, 5000 and 10.000 nodes: https://github.com/themill/wiz/blob/dev/test/benchmark/test_graph_construction.py

These are my results:

> pytest ./test/benchmark/test_graph_construction.py
----------------------------------------------------------------------------------------- benchmark: 4 tests ----------------------------------------------------------------------------------------
Name (time in ms)            Min                   Max                  Mean             StdDev                Median                 IQR            Outliers       OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_100_nodes            7.6129 (1.0)         21.2669 (1.0)          8.6532 (1.0)       2.6118 (1.0)          8.0060 (1.0)        0.7610 (1.0)           1;1  115.5637 (1.0)          26           1
test_1000_nodes          84.9140 (11.15)      124.0661 (5.83)        98.9154 (11.43)    18.8515 (7.22)        85.8061 (10.72)     31.5028 (41.39)         1;0   10.1096 (0.09)          5           1
test_5000_nodes         740.1121 (97.22)      818.2340 (38.47)      771.1314 (89.11)    28.9191 (11.07)      764.4799 (95.49)     28.7578 (37.79)         2;0    1.2968 (0.01)          5           1
test_10000_nodes      2,017.0588 (264.95)   2,194.4909 (103.19)   2,102.8371 (243.01)   75.7333 (29.00)    2,068.1510 (258.33)   123.4869 (162.26)        2;0    0.4755 (0.00)          5           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Ie, I'm assuming that if foo-1.2.3 requires bah>1.2, that might be 20 outgoing edges (if there are 20 bah versions > 1.2)

In this scenario, only the latest version of "bah" will be added to the graph (and all its variants if necessary). But if another package requests another version of bah (e.g. "bah==3.0.0"), this version will also be added to the graph and it will be considered as a conflict that needs to be solved. That's what I meant when I said that all versions are added to the graph, sorry for the confusion.

ROOT
 |- foo==1.2.3
 |   `- (bah>1.2) bah==3.5.0
 `- bar==2.1.7
      `- (bah==3.0.0) bah==3.0.0

Have you considered separating the solver out into its own project? General dependency resolvers in python aren't much of a thing and I'm sure there would be applications for it outside of package management.

That would be an interesting idea. The main blocker at the moment would be time and resource but once we're done with open sourcing this framework we can probably give it a try!

buddly27 commented 4 years ago

@mottosso We just made our Python installer public: https://github.com/themill/qip

You might find it useful to kickstart small Python projects without too much overhead. This part might particularly interest you: https://qip.readthedocs.io/en/stable/development.html

Let us know what you think about it!

mottosso commented 4 years ago

Thanks, a wrapper for pip is a good idea and something I'm familiar with. :)

https://github.com/mottosso/rez-pipz

It does the same job, calling pip to install someplace, and creates a Rez package around it for use in an environment. Rez even have something similar built-in.

The main hurdles I found was:

How do you deal with executables? E.g. some packages provide command-line binaries that are generally compiled during install. Pipz solved this with an executable side-car file.
Is Wiz case-sensitive? Might sound like a small issue, but some packages refer to other packages in various cases since pip doesn't mind case, but when you then search Wiz for qt.py when you only have Qt.py installed you come up empty. Pipz hasn't really solved that
How do you determine dependencies from pip, without first downloading the packages? Pipz hasn't solved this either.

Other than that, this is limited to Python packages, which isn't necessarily an issue. I found that you could easily extend the concept to additional package managers, like rez-scoopz for Windows packages, and had some ideas for wrapping things like yum and apt over time as well.

So overall, I think this is the right track!

buddly27 commented 4 years ago

Yeah it's pretty much the same strategy we adopted :)

How do you deal with executables

We actually don't use the automatically generated scripts, instead we create aliases from entry points using the python -m command. For instance, definition for pyblish-base will have:

"command": {
    "pyblish": "python -m pyblish.cli"
}

So you can simply run it with the wiz run pyblish command.

For compiled packages, it will try to compile it during install like it does with Pip, but we have set a Devpi index over PyPi to make sure that we have wheel instead of Tar files.

Is Wiz case-sensitive?

Wiz is, but Qip isn't. For consistency, all definitions created by Qip use lowercase identifier, so we don't really run into this issue:

>>> qip install Qt.py
info: Requested 'Qt.py'
info:   Installed 'Qt.py-1.3.0'.
info:   Wiz definition created for 'Qt.py-1.3.0'.
info: Packages installed: Qt.py-1.3.0
info: Package output directory: '/tmp/qip/packages'
info: Definition output directory: '/tmp/qip/definitions'

>>> wiz -add /tmp/qip/definitions use qt.py -- python

How do you determine dependencies from pip, without first downloading the packages? Pipz hasn't solved this either.

Yeah we haven't really solved that one either, we download the package and then extract the dependencies using pkg_resources

Might be worth submitting a issue to pip to provide this feature at some point.

Other than that, this is limited to Python packages, which isn't necessarily an issue. I found that you could easily extend the concept to additional package managers, like rez-scoopz for Windows packages, and had some ideas for wrapping things like yum and apt over time as well.

I didn't know about scoop, very interesting! We are currently working on a wrapper around Conda which will provide non-Python libraries and the ability to setup our own channel

mottosso commented 4 years ago

instead we create aliases from entry points

Ah, yes. You've got the luxury of not supporting Windows. :)

buddly27 commented 4 years ago

Ah, yes. You've got the luxury of not supporting Windows. :)

Not sure I understand, you mean python -m foo doesn't work on Windows???

mottosso commented 4 years ago

No, it does. Re-reading your message, I thought you were referring to making an chmod -x myexe file, with a #! /usr/bin/python in it, calling on -m foo to replace the binaries that are normally generated by pip on Linux, because on Windows the files generated are compiled .exe binaries which are trickier to reproduce. But possible, if you have a look at how rez-pipz does it.

But what you actually meant was..

wiz run myexe

Which is interesting! It would solve that issue, on every platform. It does make executables longer to type, and is that something you call from within an environment, or before you activate an environment?

buddly27 commented 4 years ago

Which is interesting! It would solve that issue, on every platform. It does make executables longer to type, and is that something you call from within an environment, or before you activate an environment?

We can call aliases from the wiz use arguments

> wiz use ipython -- ipython
info: Start command: python -m IPython

And also from a spawned environment as we create a temporary RC file to define aliases:

> wiz use ipython 
info: Spawn shell: /bin/bash
$ alias ipython
alias ipython='python -m IPython'

This strategy works for most cases, but not all of them. Funnily enough, I got reminded of that when trying to demonstrate it with pyblish! Running python -m pyblish.cli will not work as pyblish/cli.py is not executable, so we would have to execute it with python -c instead:

>>> wiz use pyblish-base -- python -c 'from pyblish.cli import main; main()' --help

More on this issue here: https://github.com/themill/qip/issues/7

Also Wiz doesn't work on Windows yet, but we should have this covered soon-ish (https://github.com/themill/wiz/issues/14).