Open ifsheldon opened 3 years ago
@ifsheldon
As always, appreciate the quality of your input. Some of these are great points. FYI, we are heavily refactoring the docs in https://github.com/taichi-dev/docs.taichi.graphics.
To give some quick answers:
So, to declare and structure multiple fields, we can do that in 3 ways(purely the basic, purely place, mixing the basic and place).
Yes. And by default you just need to use the basic version. The advanced ti.root
stuff is mostly for defining the sparse hierarchical SNodes.
So, we also have 3 ways to specify that we want gradients on multiple fields.
I think there are only two? Also the doc has mentioned that if you use ti.root.lazy_grad()
, you don't have to repeatedly define needs_grad
on each field. (That said, it's hard to say whether this is a good API. IMHO explicitly calling out which set of fields you need the gradient could make the code more readable):
Clearly you are putting a nontrivial amount of efforts using Taichi, which is invaluable to us & the community :-) Just so you know, we have a slack channel to discuss all sorts of things related to Taichi. Would you prefer sending us an email (listed in https://taichi.graphics/contact/) so that we can invite you in? That way it will be much faster and more effective for your development & sharing your feedback.
Enjoy your weekend!
I think there are only two?
Well, you can do it with only needs_grad
, only ti.root.lazy_grad()
or mixed use. When my collaborator wrote the code, he actually tried writing like:
volume = ti.field(ti.f32, needs_grad=True)
ti.root.dense(ti.ijk, volume_resolution).dense(ti.ijk, (4, 4, 4)).place(volume)
ti.root.dense(ti.ijk, volume_resolution).dense(ti.ijk, (4, 4, 4)).place(volume.grad)
ti.root.lazy_grad()
so he seemed to have done many duplicate configurations here. And I don't know whether this will break autodiff without testing because such an edge case is not mentioned anywhere. Now I still don't know what will happen under the hood in this case, but our tests empirically tell us that it doesn't matter.
The advanced
ti.root
stuff is mostly for defining the sparse hierarchical SNodes.
Well, since one of the target fields of Taichi is differentiable rendering, we will need dense hierarchical SNodes (for better cache hit) and autodiff together.
What I tried to say is that please consider all possible use combinations of your APIs, check their inter-compatibilities and tell us users about the (in)compatibilities in your documentation.
Another idea, I think it's possible to implement the real lazy_grad()
due to the release of #2501? @k-ye
Proposed feature
More details, clarifications, examples and guidelines on Differentiable Programming.
Context
In the process of developing a differentiable Direct Volume Rendering (DVR) renderer, I realize that the information from the documentation, the paper and the DiffTaichi repo is still far from enough. So, from my pitfalls, I'd like to point out several points that should be detailed in the documentation to facilitate Taichi users to efficiently develop their differentiable applications.
Feature details
Guidelines for developing differentiable applications
As autodiff of Taichi enforces many constraints that will lead to code/data structure changes and algorithmic changes, there should be some suggestions on how to develop a differentiable application from the first step. Some users, like me, tend to only focus on the forward pass of an application, naively transferring the development workflow from PyTorch or Tensorflow.
But such a development style does not work when we develop differentiable applications on Taichi. Therefore, you should explicitly point out this problem by giving some suggestions. I can see, in some issues, you suggest users to develop their applications step by step in the forward pass, only adding new features when autodiff works correctly in the backward pass. So, I think you should solidify such suggestions in your documentation.
Clarifications on APIs inter-compatibility
To get optimized performance, we users are suggested to customize data layout via
.place()
APIs ofti.root
, but you didn't mention how to properly mix.place()
APIs with autodiff.To make things even more complex, we first see how we can declare and structure multiple fields and then how we can specify that we need gradients.
To declare and struct a field, we can:
vec_field = ti.Vector.field(3, ti.f32, shape=3)
ti.root
's.place()
API, likeSo, to declare and structure multiple fields, we can do that in 3 ways(purely the basic, purely
place
, mixing the basic andplace
). Now, to specify we want gradients, we can:needs_grad=True
ti.root.lazy_grad()
So, we also have 3 ways to specify that we want gradients on multiple fields. Therefore, you see now we have 3*3 ways to structure fields and set up gradients. Here come the questions:
needs_grad=True
, do I need toplace
itsgrad
as well? For example,ti.root.lazy_grad()
, do I need toplace
itsgrad
as well? For example,Examples of tracing gradients of recursive formulations
We sometimes have recursive formulations in our program, for example, Exponentially Weighted Moving Average. And how to workaround the Data Access Rule of autodiff is tricky. I called my workaround as Explicit Taping. For the discussion, please see this issue (#2425 ).
The examples in that issues are parallelized differentiable EWMA calculator. And I think the two example (wrong one and correct one) are great for illustration purpose.
Also, I have encountered an issue, in which, when I used a field whose values are read and written and they are used to control kernel code flows, the gradients are not traced. This is also caused by violation on Data Access Rule. Although the values that need gradients are not over-written, the values for control flows are over-written, which also leads to gradient tracing failures.
A simple example is
Comments
It's understandable that Taichi's documentation is not detailed enough, but I think you can do better on explaining and giving more guidance on the Differentiable Programming part.
I can definitely make an PR to improve the documentation of Differentiable Programming, but I think you may have something more to add and my comments above are a good draft for you to complete the documentation improvement.