huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
14.69k stars 840 forks source link

Math is difficult to implement #691

Open timokoesters opened 10 months ago

timokoesters commented 10 months ago

In Python it's possible to write just this: 3 * x**2 - 4*x + 5

With Candle it currently looks like this, which is difficult to read and write: Ok((&(&(&x.powf(2.0)? * 3.0)? - &x * 4.0)? + 5.0)?)

  1. Why does everything return a Result? I assume the reason is that the GPU connection might fail, but in that case maybe it's okay to panic the whole application?
  2. Why is Tensor not Copy? It's an Arc, so it would be okay in theory.
  3. A few more traits can be implemented to make it look like this: 3.0 * x * x - 4.0 * x - 5.0

I wrote an example here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=e5abf73c9e38fb6cef2c17b5a1898a29

LaurentMazare commented 10 months ago

Yes the syntax is cumbersome, we have some ideas that should slightly improve it though it will not be as simple as the version you suggest. To answer your points:

  1. All the ops may indeed fail, the most common thing is not really the GPU going down but more the GPU running out of memory, using invalid shape, etc.
  2. I don't think a Tensor could be copy, copy implies that you can just do copy the immediate memory representation (see here) which isn't the case for Arc. Hence Tensor is only clone - and the clone op is pretty inexpensive.
timokoesters commented 10 months ago

One idea to work around results is to impl Add<Result> for Tensor, to allow chaining the results automatically, but I don't know if that is a great idea.

And I think instead of deriving Copy, you can impl operations both on the values as well as on the references (or AsRef?)

LaurentMazare commented 10 months ago

Both of these are actually already done :)

timokoesters commented 10 months ago

Oh that's great, maybe I was using an old version, or it only worked one way.

timokoesters commented 10 months ago

It looks like Tensor + Result is implemented, but not Result + Tensor or Result + Result. It also doesn't work for methods like broadcast_div.

I have another related problem: I'm unable to write a rusty variant for modifying a single index: var[2, 3] += 1, is there a method that can help me with that?

LaurentMazare commented 10 months ago

Re Result + Result I think this is because of the foreign trait limitation of rust so not much that we can do here. Re assigning single values in place, we don't really have an op to do this and it's on purpose as these tend to be really inefficient so we prefer favoring more functional approaches and avoiding accessing data by index. I don't think this has been an issue for all the models we've been looking at so far but if there are some specific models for which this would be a limitation, certainly curious to hear about them.

timokoesters commented 10 months ago

Sorry, you are correct on the Result+Result, but the other combinations will work, as you can see in my example. An idea to even allow result+result is to have all operations return Tensor, but hide the Result internally and only expose it with some getter methods.

I am redoing the "Neural networks from zero to hero" with Candle and one of the first things is building a character combination count matrix (https://youtu.be/PaCmpygFfXo?feature=shared&t=574). This adds 1 to specific entries in the matrix. Another use case would be implementing one-hot encoding.

Anyway, thanks for your quick answers!

timokoesters commented 10 months ago

most common thing is not really the GPU going down but more the GPU running out of memory, using invalid shape

These sound like unrecoverable errors, so it could be okay to panic in these cases?

chronicl commented 10 months ago

Making tensors Copy could be achieved by having some global state hold all the information of the tensors and making the tensors themselves ids for retrieving that information. This is commonly done in rust whenever you want a type to be Copy for ergonomic reasons, for example in leptos's reactive systems. Of course this comes along with all of the caveats of dealing with global state.

I do like this idea:

hide the Result internally and only expose it with some getter methods.

this would allow for more ergonomic math and allow users to choose when they want to return early due to an error (if at all). I believe Rust's track_caller can be used to track the location of where the error actually happened, so we wouldn't lose that information.

LaurentMazare commented 10 months ago

most common thing is not really the GPU going down but more the GPU running out of memory, using invalid shape

These sound like unrecoverable errors, so it could be okay to panic in these cases?

I don't really agree with this. Imagine you have a server that runs inference and somehow your gpu runs oom, I feel that you would want some proper error handling at this point so that you can stop the current computation but not the whole server, (and catching the panic to do this would sound sketchy).

Narsil commented 10 months ago

100% on this. Having implemented a few servers, panicking is not great !

cchance27 commented 7 months ago

Trying my first hand at math with candle, and just ran into something... is Add only available really for f64? Not u32, etc?

LaurentMazare commented 7 months ago

Trying my first hand at math with candle, and just ran into something... is Add only available really for f64? Not u32, etc?

Right, it's just a convenient shortcut. If you want to add an u32, you can do something like tensor.broadcast_add(&Tensor::new(42u32, tensor.device())?)?.

cchance27 commented 7 months ago

Not gonna lie, I’m sorta an idiot as I figured I could use broadcast_add but then was like “wtf it takes a tensor, not a u32…. I’m so confused… completely forgetting I could just create a single value tensor lol

asukaminato0721 commented 1 month ago

Oh, I currently use tensor.affine(1., 42.)? lol.