pubgrub-rs / pubgrub

PubGrub version solving algorithm implemented in Rust
https://pubgrub-rs.github.io/pubgrub/pubgrub/
Mozilla Public License 2.0
337 stars 29 forks source link

Pass an iterator to `add_incompatibility_from_dependencies` #226

Closed konstin closed 3 weeks ago

konstin commented 3 weeks ago

Previously, add_incompatibility_from_dependencies was hardcoded to take a hash map. By relaxing this to impl IntoIterator<_>, we avoid converting from uv's ordered Vec<(_, _)> to a HashMap<_, _> and avoid cloning.

This intentionally does not change the public api of the DependencyProvider, only the internal interface that we are using in uv.

konstin commented 3 weeks ago

Is a significantly more complicated signature. This is a large next step toward DependencyProvider becoming an advanced user only interface. Although that may be an inevitable consequence of having production users.

We can also keep the DependencyProvider more constraint, we only call add_incompatibility_from_dependencies.

If it's relevant, i can do some benchmarking with this changed and with it rolled back in uv.

It is not obvious that this is the "right" abstraction for the situation. Would something else be more ergonomic for users? Would something else be more compatible with future improvements iterator?

I tried to abstract it into a trait and other tricks, but (my) rust is too limited to figure out something better.

konstin commented 3 weeks ago

I've reverted the biggest change to the dependency provider interface, we can also merge Dependencies<DependencyConstraints<String, SemVS>, Self::M> into a single type again when we're aligned on this PR in general.

Eh2406 commented 3 weeks ago

It is not obvious that this is the "right" abstraction for the situation. Would something else be more ergonomic for users? Would something else be more compatible with future improvements iterator?

I tried to abstract it into a trait and other tricks, but (my) rust is too limited to figure out something better.

Having ask you to make this PR and then had second thoughts while reviewing it with the second thoughts being abstract "does there exist" questions I owed you my full time and attention today. I went to a lot of harebrained ideas, most of which obviously wouldn't work without even looking at an editor. Eventually I had the idea of a DependencyAdder which was a wrapper around &mut state for the basic operations we want to stabilize. (Add one dependency, and iterator of dependencies, Mark this version is unavailable. That's all for now but we could add more later.) This didn't work because of type shenanigans. Then I thought about generalizing it so that it invoked a user provided callbacks for each stabilized method. Halfway through implement it, I realized I was hand rolling a vtable. So I tried making a trait. The result is https://github.com/pubgrub-rs/pubgrub/compare/dev...Eh2406:pubgrub:add_deps_take_2

What are people's thoughts? Should I open a PR for us to discuss that proposal?

mpizenberg commented 3 weeks ago
struct DependencyAdder<'c, DP: DependencyProvider> {
    state: &'c mut State<DP>,
    package: &'c DP::P,
    version: &'c DP::V,
    err: Option<PubGrubError<DP>>,
}

Isn’t the fact that this struct holds a mut ref to the state going to be super annoying for implementations with async concurrency? Or is this structure always only short lived? (Sorry if my question doesn’t make sense)

Waiting on Konstin feedback on this.

Otherwise, I don’t remember exactly why we did not use impl IntoIterator. I’m pretty sure we tried. Was it because of the simplicity tradeoff? or because we did it and benchmarked no perf improvement? or maybe because it was not well supported by the compiler at the time (we had a few decisions related to compiler support I recall)?

konstin commented 3 weeks ago

The &mut data passing makes DependencyProvider a lot more complex. I do like the add_incompatibility_from_dependency a lot though, i have to check if that may be all we need, though i probably won't get to it today.

Eh2406 commented 3 weeks ago

Isn’t the fact that this struct holds a mut ref to the state going to be super annoying for implementations with async concurrency? Or is this structure always only short lived?

It is short lived. So I don't expect trouble. But I don't maintain a async api, so not 100%. If it makes problems for UV, they can use Mutex<&mut State> as they have their own version of resolve already.

Otherwise, I don’t remember exactly why we did not use impl IntoIterator. ... maybe because it was not well supported by the compiler at the time (we had a few decisions related to compiler support I recall)?

impl trait in trait only recently stabilized. So that is why it did not work when we last tried it. If you like the ergonomics of that API, we should merge this PR as originally proposed.

mpizenberg commented 3 weeks ago

I think the lib is already too complex to be used without looking at docs. If we want people making useful things with pubgrub they should implement quite a lot themselve already so I don’t think this change would add too much barrier to entry.

Eh2406 commented 3 weeks ago

I'm having a lot of thoughts, but I'm not sure what they are. I'm going to start writing in the hope that I can figure it out by the time I get to the end. If this ends up being incoherent I'm sorry.

There bunch of things I want the get_dependencies API to have. But the world is not perfect and were going to have to compromise on some of them.

  1. "simplicity", this is hard to measure is something like how much esoteric features of rust do you need to understand to use the API for basic functionality. But on the other hand it's also how well does it use the functionality of the language to make normal-looking code.
  2. "compatibility", it would be nice if code that existed on previous version still worked on the new API.
  3. "minimal cloning", it should not require constructing a data structure (or cloning an existing one) to work with.
  4. "duplicate", it should be possible to have two dependency edges on the same package. ( This is required because it actually exists in Python and Rust, and the error messages are significantly better if we express this duality directly to PubGrub instead of calling intersection beforehand.)
  5. "future-proof", there are many kinds of edges we have recently added or might want to add in the future and they should not require breaking changes to the API. Examples include: "unavailable with metadata" (already merged), "dependencies with metadata", "constraints", "constraints with metadata", "week dependencies". let's not argue about whether any of these are good ideas in this thread, but the list is long enough that at some point we will be convinced to add at least some of these.
  6. "unrepresentability", it only makes limited sense to return a list of dependencies AND that a package is unavailable. Our API should not let you represent nonsensical things.

So let's start by evaluate the existing API: ) -> Result<Dependencies<DP::P, DP::VS, DP::M>, DP::Err>.

  1. "simplicity": Definitely the simplest option, but already has a lot going on. Especially if you elaborate all the type aliases.
  2. "compatibility": definitionally.
  3. "minimal cloning": No. You need to return an owned map.
  4. "duplicate": No. It is a map.
  5. "future-proof": Not really. We can add new variance to Dependencies (if we market nonexhaustive), but that's a really awkward way to add a new kind of edge. The new syntax would need to duplicate all the existing functionality, and add more functionality, and then we need to document why you would use one or the other.
  6. "unrepresentability": Yes.

Next let's look at the original proposal from this PR. ) -> Result<Dependencies<impl IntoIterator<Item = (Self::P, Self::VS)> + Clone, Self::M>, Self::Err,>.

  1. "simplicity": That is very long to read, and it uses impl Trait, and Item = (I don't even remember the name of that syntax).
  2. "compatibility": Yes. A Map is a impl IntoIterator<Item = (Self::P, Self::VS)> + Clone
  3. "minimal cloning": If you have a ordinary & then yes. (.iter().map...) If you have a Rc or RefCell then no.
  4. "duplicate": Yes.
  5. "future-proof": Not really. Basically the same situation as we had before.
  6. "unrepresentability": Yes.

What if we add a new Enum for "future-proof". Basically combining this PR with #124. Something like ) -> Result<Dependencies<impl IntoIterator<Item = Requirement<Self::P, Self::VS>> + Clone, Self::M>, Self::Err,> where Requirement is Required | Constrained.

  1. "simplicity": All the same problems as before but even longer.
  2. "compatibility": No.
  3. "minimal cloning": Same as before. If you have a ordinary & then yes. (.iter().map...) If you have a Rc or RefCell then no.
  4. "duplicate": Yes.
  5. "future-proof": Yes, we can add new kinds of edges to Requirement without breaking things. As long as what we are adding are "edges" were good.
  6. "unrepresentability": Yes.

Having two different Enum is really stretching our simplicity budget. Perhaps we can combine them? Something like ) -> Result<impl IntoIterator<Item = Requirement<Self::P, Self::VS, Self::M>> + Clone, Self::Err,> where Requirement is Required | Constrained | Unavailable.

  1. "simplicity": Not great, but slightly better than before.
  2. "compatibility": No.
  3. "minimal cloning": Same as before. If you have a ordinary & then yes. (.iter().map...) If you have a Rc or RefCell then no.
  4. "duplicate": Yes.
  5. "future-proof": Yes, we can add new kinds of edges to Requirement without breaking things. As long as what we are adding are "edges" were good.
  6. "unrepresentability": No. This is sad but I don't know how important it is in reality.

A note on the + Clone, it's only really required for checking for self dependencies. We may be able to avoid it with .inspect or some such. But I don't think it changes any of the fundamental argument.

Now let's evaluate my harebrained idea from yesterday, add_deps_take_2. dependency_adder: &mut DependencyVisitor<Self::P, Self::VS, Self::M>>) -> Result<(), Self::Err>

  1. "simplicity": On one hand this is a lot shorter and does not require anything fancy. On the other hand the "mutable argument you need to interact with in this method call and then return ()" is not a normal Rust pattern.
  2. "compatibility": Not even close.
  3. "minimal cloning": Yes. this should be zero overhead to interact with no matter what your data pattern is. (Except for possibly asink.)
  4. "duplicate": Yes.
  5. "future-proof": Yes. We can easily add arbitrary methods to DependencyVisitor without breaking anyone.
  6. "unrepresentability": No.

So I've gotten to the end, and I still don't know what I was trying to say. It seems to have turned into a brainstorming about #148. One take away is that eventually we will probably grow a SimplifiedDependencyProvider with an api like 0.2 that can be used if the full complexity of DependencyProvider is not needed. In the time I spent writing this Matthieu commented that the additional API complexity is not a critical blocker. So we should probably merge this PR with its original configuration. But we should do so recognizing that there are probably other changes that will be needed. This is not a "one-way door", it is totally okay to merge this and then make more changes.

mpizenberg commented 3 weeks ago

One take away is that eventually we will probably grow a SimplifiedDependencyProvider with an api like 0.2

I’d think the same, so as long as it’s possible to rewrite a v0.2-like api from v0.3 I don’t think your point (2) will be an issue.

What if we add a new Enum for "future-proof". Basically combining this PR with https://github.com/pubgrub-rs/pubgrub/pull/124. Something like ) -> Result<Dependencies<impl IntoIterator<Item = Requirement<Self::P, Self::VS>> + Clone, Self::M>, Self::Err,> where Requirement is Required | Constrained.

I remember we discussed a lot about this. My point of view is that we should base this type of changes on the theory (incompatibilities). I think this type of change, enabling dependencies to provide more fine-grain type of incompatibilities, instead of just the shape {a, not b} should be approached holistically, not iteratively with enums variants appended one after the other.

But also, I’m also not worried about compatibility, because just like it’s possible to make v0.2-like wrappers when v0.3 is released, it can be exactly the same for v0.4 because this is a strict superset of capabilities.

Eh2406 commented 3 weeks ago

I think this type of change, enabling dependencies to provide more fine-grain type of incompatibilities, instead of just the shape {a, not b} should be approached holistically, not iteratively with enums variants appended one after the other.

I don't quite follow. Trying to figure out what you had in mind. If we want to provide direct access to the solver, that will probably involve public access to "state" which is probably worth doing but its own headache. Even if we did that, we would still want DependencyProvider to be quite flexible. So I don't think it helps me answer the question what API DependencyProvider should offer. Taking your comment in different direction, any approach that succeeds at "future-proof" can always add a direct "raw incompatibility" version (either in a non-variant "Raw" or a .add_raw_incompatibility() depending). Which seems like even more reason to figure out the future proofing story. Even if we provided such a raw API, I would want to have some amount of API that makes it clear what is "a pattern we know works" and what is "feel free to try it and see what happens". Perhaps by having named variance or named callbacks or by having the constructors of incompatibilities for the known patterns. So those are some thoughts I'm having. What did you have in mind by "approached holistically"?

mpizenberg commented 3 weeks ago

What I mean by "holistically" is that all possibilities should be evaluated and added to the api at the same time. There is a finite number of incompatibility types that can be considered. {a}, {-a}, {a, b}, {a, -b} etc. For all the incompatibilities with 1 element and two elements it’s important to give appropriate names. And for incompatibilities with more than two, I imagine it we can customize some patterns, like "only one positive" or "only one negative", otherwise fallback to asking people to enter raw incompatibilities.

What I’d like to avoid is saying ok now we enable {a, b} and {a, -b}, let’s see in the future if we need something else. Instead, it would make sense to just verify that with the past two years of optimizations we didn’t break a fundamental property that would prevent pubgrub from working with any incompatibility provided. And if ok, add these apis in a cohesive fashion.

Let me know if I’ve expressed my opinion better this time.

Eh2406 commented 3 weeks ago

That makes a lot of sense. Let's get a full list of reasonable possibilities and name/document what they might be useful for before adding flexible APIs.

In office hours @konstin convinced me that avoiding the hash map construction is only an important performance optimization when the index data has been prefetched. If the data actually needs to be loaded from somewhere (whether that's network or disk) reading the data will be so much more expensive then constructing the map to make the optimization irrelevant. Ironically, the only place the optimization is visible is while doing benchmarking.

UV also rools its own resolve loop, and does not use dependency provider. so we agreed that the best next step was to change the API for State and leave the API the same for dependency provider. we will discuss changes to dependency provider in follow-up PR's and after incorporating your holistic approach.

And while I was writing that comment it looks like the new revision is in.

konstin commented 3 weeks ago

To summarize, in the initial version of this PR we were changing DependencyProvider::get_dependencies from returning a hashmap to return an impl IntoIterator<_> to avoid the conversion cost from our dependencies Vec to the pubgrub's hashmap. But in uv, we're not actually using DependencyProvider, we're driving our own solver loop and call add_incompatibility_from_dependencies directly. In the merged version, we only change this internal api to take an iterator instead of a hashmap.

Personally, i don't think the current signature of DependencyProvider::get_dependencies is fine. I'd prioritize releasing 0.3 and the new community engagement that comes with it. While the hashmap clone shows up in benchmarks, in practice you have to be far gone into microoptimizations for this to matter, i'm not even sure if it's noticeable in uv (and if you're that complex and that far, you want to run the solver loop yourself anyway, which is an entirely different interface). It would help a lot to get the changes we already made (associated type, better error handling, both the error type and the custom metadata, the massive speedups, etc.) out so we can get users reporting on how well the apis works for them.