mitsuba-renderer / drjit

Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering
BSD 3-Clause "New" or "Revised" License
593 stars 43 forks source link

[:bug: bug report] Incompatible function arguments (probably related to int32 variable indexing) #108

Open DoeringChristian opened 1 year ago

DoeringChristian commented 1 year ago

Hi, I think I have found a bug related to the recent change allowing more than 4 billion variables. When optimizing a large scene I get the error after 910 optimization steps.

File "/home/doeringc/workspace/python/zed/ddtar/zed/test01/out/final/../../difr.py", line 95, in step
    img = mi.render(
  File "/home/doeringc/workspace/cpp/mitsuba3-test/build/python/mitsuba/python/util.py", line 518, in render
    return dr.custom(_RenderOp, scene, sensor, params, integrator,
  File "/home/doeringc/workspace/cpp/mitsuba3-test/build/python/drjit/router.py", line 5613, in custom
    Type.ad_add_edge_(index, tmp_in.index_ad)
TypeError: ad_add_edge_(): incompatible function arguments. The following argument types are supported:
    1. (src_index: int, dst_index: int, cb: handle = None) -> None

Invoked with: 2148013987, 2148014840

This seems pretty ironic since 2148013987 and 2148014840 are both integers in the python sense. In the ad_add_edge interface the arguments are however defined as int32_t C++ types here or here. I'm not quite sure which definition is the correct one for this issue but I imagine that since the 4 billion variables update not all bindings where updated to use int64_t types. Currently I don't have the time for it but otherwise I would try to fix it myself since it seems to easy at first glance. Thanks for your Help.

Speierers commented 1 year ago

Hi @DoeringChristian ,

Variable indices are still stored using uint32_t. Moreover, the indices passed to ad_add_edge_ are indices of variables in the AD graph.

Could you please provide a minimal reproducer script so we can take a look at this issue on our side?

DoeringChristian commented 1 year ago

Hi @Speierers, unfortunately i cannot reproduce the error with a simpler setup at the moment. I'll try some things to be sure that this is not an issue in my code. This could take a while though since the error only appears after 910 steps and each of them takes about 3min.

vid8687 commented 5 months ago

@DoeringChristian did you find an alternate reason for this ? I've noticed the same issue that shows up only after a large number of iterations.

DoeringChristian commented 5 months ago

No unfortunately I haven't het gotten around to testing this yet.

merlinND commented 5 months ago

I believe this will be fixed in the upcoming version of DrJit, where the numbering scheme has been changed.

vid8687 commented 5 months ago

Thanks! Will watch out for that drjit update.

vid8687 commented 5 months ago

I notice the above numbering issue when the iterations are high even when I have a sequence of fairly small optimization iterations. What is the "correct" way to run something like this, is there a way to "reset" any graph numbering since each internal loop is independent. This would fail with the error mentioned above at some value of loop count but enough of them run okay. For e.g.:

for i in range(loop_count): #loop count ~ 200
    #setup parameters
    #differentiating over 3 textures ~150^2 pixels
    opt = mi.ad.Adam(..)
    for it in range(opt_iteration_count): #opt_iteration_count ~30
        #render, backward, step
        #update parameters
    #this probably doesn't help, but delete and garbage collect
    del opt 
    gc.collect() 
vid8687 commented 5 months ago

@merlinND any suggestions on the above?

merlinND commented 5 months ago

Hello @vid8687,

Until the release of the next version, I believe that the simplest way would be to start iterations of your outer for loop in different processes. I am not aware of a way to reset the variable counter in the current version (@njroussel, please let us know if you know a way).