raskr / rust-autograd

Tensors and differentiable operations (like TensorFlow) in Rust
MIT License
487 stars 37 forks source link

Newbie-friendly documentation would be a huge benefit #60

Open elidupree opened 1 year ago

elidupree commented 1 year ago

I'm currently working on some ML projects where autograd looks like the ideal tool for the job. Unfortunately, I had a bit of trouble figuring out the basics of autograd, because of the somewhat terse documentation.

At the moment, the main content of the docs begins with

Here we are just computing partial derivatives of z = 2x^2 + 3y + 1.

followed by a block of code which does exactly that. This is helpful as an example, but what's missing is a conceptual overview: What's a placeholder? What's a tensor? What does ag::run mean? What's feeding?

I was able to figure these things out by digging through the rest of the docs, but it wasn't easy, and it would be much more difficult for someone who was less familiar with ML concepts than me (I am perhaps in the middle of the spectrum). And considering how much work has gone into the implementation, making it more accessible would be very valuable!

Many of the individual functions could also benefit from more-detailed documentation. As an example, the documentation for tensor_ops::grad() currently says:

Symbolic gradient tensors of xs in the same order as xs’s

  • ys - Targets of differentiation that are arbitrary shapes.
  • xs - Tensors with which differentiate ys.

See the more useful helper: crate::optimizers::grad_helper()

and returns a vector of Tensors; but it could be much clearer about what the returned values are, and what sizes they are. (One for each of xs, I assume? but what shape are they if there are more than one of ys?)

(This is only an example; almost every function has a similar level of detail at present.)

raskr commented 1 year ago

@elidupree I'm sorry I've made implicit assumption that people interested in this lib are familiar with libs like tensorflow or theano and it's not newcomer-friendly. (the basic concepts and behavior are the same as those of those libs tho)

but it could be much clearer about what the returned values are, and what sizes they are.

returned tensors have the same shapes as corresponding xs's ones. It's so by the mathematical definition of gradient.

but what shape are they if there are more than one of ys?

So the gradients's shapes and ys's are not related.

elidupree commented 1 year ago

Thanks for the reply! I recognize that writing comprehensive documentation is hard work. But I figured I should leave an issue, because ultimately, documentation is a critical part of any library.

returned tensors have the same shapes as corresponding xs's ones. It's so by the mathematical definition of gradient.

Sorry – and I don't want to get too bogged down in this specific example – but there's one thing I'm still having trouble understanding. The gradient of a single-output function is shaped exactly like the inputs, of course. But if you pass multiple ys (or even one y with more than 0 dimensions), doesn't that mean you are differentiating with respect to multiple output values? (Mathematically, that should increase the dimensionality of the outputs.) Or if it's not that, what does it mean instead? (Is it just taking the partials relative to the sum of the ys, or something?)

raskr commented 1 year ago

I recognize that writing comprehensive documentation is hard work. But I figured I should leave an issue, because ultimately, documentation is a critical part of any library.

agree. I'll think about it again.

Is it just taking the partials relative to the sum of the ys

As fortensor_ops::grad(), yes. sorry for the poor documentations. If you want to assign your arbitrary gradients for each y, tensor_ops::grad_with_default() is available. cf. https://docs.rs/autograd/2.0.0-rc3/src/autograd/tensor_ops/mod.rs.html#130