Open DoeringChristian opened 1 year ago
Hi @DoeringChristian ,
Variable indices are still stored using uint32_t
. Moreover, the indices passed to ad_add_edge_
are indices of variables in the AD graph.
Could you please provide a minimal reproducer script so we can take a look at this issue on our side?
Hi @Speierers, unfortunately i cannot reproduce the error with a simpler setup at the moment. I'll try some things to be sure that this is not an issue in my code. This could take a while though since the error only appears after 910 steps and each of them takes about 3min.
@DoeringChristian did you find an alternate reason for this ? I've noticed the same issue that shows up only after a large number of iterations.
No unfortunately I haven't het gotten around to testing this yet.
I believe this will be fixed in the upcoming version of DrJit, where the numbering scheme has been changed.
Thanks! Will watch out for that drjit update.
I notice the above numbering issue when the iterations are high even when I have a sequence of fairly small optimization iterations. What is the "correct" way to run something like this, is there a way to "reset" any graph numbering since each internal loop is independent. This would fail with the error mentioned above at some value of loop count but enough of them run okay. For e.g.:
for i in range(loop_count): #loop count ~ 200
#setup parameters
#differentiating over 3 textures ~150^2 pixels
opt = mi.ad.Adam(..)
for it in range(opt_iteration_count): #opt_iteration_count ~30
#render, backward, step
#update parameters
#this probably doesn't help, but delete and garbage collect
del opt
gc.collect()
@merlinND any suggestions on the above?
Hello @vid8687,
Until the release of the next version, I believe that the simplest way would be to start iterations of your outer for
loop in different processes.
I am not aware of a way to reset the variable counter in the current version (@njroussel, please let us know if you know a way).
Hi, I think I have found a bug related to the recent change allowing more than 4 billion variables. When optimizing a large scene I get the error after 910 optimization steps.
This seems pretty ironic since 2148013987 and 2148014840 are both integers in the python sense. In the ad_add_edge interface the arguments are however defined as int32_t C++ types here or here. I'm not quite sure which definition is the correct one for this issue but I imagine that since the 4 billion variables update not all bindings where updated to use int64_t types. Currently I don't have the time for it but otherwise I would try to fix it myself since it seems to easy at first glance. Thanks for your Help.