mitsuba-renderer / mitsuba3

Mitsuba 3: A Retargetable Forward and Inverse Renderer
https://www.mitsuba-renderer.org/
Other
2.08k stars 243 forks source link

Failed to parse input PTX string When EmitterPtr called my custom function in PathIntegrator::sample #633

Open GensokyoLover opened 1 year ago

GensokyoLover commented 1 year ago

Summarize

I want to add a function in Emitter to be called when rendering to calculate the contribution of each Emitter. When called by Emitter* in SamplingIntegrator::render_sample function, it succeeded. But it failed in path::sample function when called by EmittPtr. I am wondering why it failed. This has bothered me for a long time. Thank you so much for your time.

System information:

OS: win10 CPU: i5-9400F GPU: nvidia 1660S Python: 3.9 NVidia driver: 528.33 CUDA: 12.0

Dr.Jit: 0.4.1 Mitsuba: 3.2.1 Is custom build? False Compiled with: vs2022 x64 Variants: scalar_rgb cuda

reproduce

https://github.com/GensokyoLover/mitsuba_issue

command

simple.xml -m cuda_rgb

function and varible added in Class Emitter

void put(const Point2f &pos, const Float *values,
        Mask active = true) const {

MI_MASKED_FUNCTION(ProfilerPhase::EmitPut, active)
m_block->put(pos, values, active);
}
mutable ref<ImageBlock<Float,Spectrum>> m_block;
DRJIT_VCALL_METHOD(put)

where the function is used

     while (loop(active)) {
            /* dr::Loop implicitly masks all code in the loop using the 'active'
               flag, so there is no need to pass it to every function */

            SurfaceInteraction3f si =
                scene->ray_intersect(ray,
                                     /* ray_flags = */ +RayFlags::All,
                                     /* coherent = */ dr::eq(depth, 0u));

            // ---------------------- Direct emission ----------------------

            /* dr::any_or() checks for active entries in the provided boolean
               array. JIT/Megakernel modes can't do this test efficiently as
               each Monte Carlo sample runs independently. In this case,
               dr::any_or<..>() returns the template argument (true) which means
               that the 'if' statement is always conservatively taken. */
            if (dr::any_or<true>(dr::neq(si.emitter(scene), nullptr))) {
                DirectionSample3f ds(scene, si, prev_si);
                Float em_pdf = 0.f;
                Mask emit_active = dr::neq(si.emitter(scene), nullptr);
                if (dr::any_or<true>(!prev_bsdf_delta))
                    em_pdf = scene->pdf_emitter_direction(prev_si, ds,
                                                          !prev_bsdf_delta);

                // Compute MIS weight for emitter sample from previous bounce
                Float mis_bsdf = mis_weight(prev_bsdf_pdf, em_pdf);

                // Accumulate, being careful with polarization (see spec_fma)
                Spectrum this_result = result;
                result = spec_fma(
                    throughput,
                    ds.emitter->eval(si, prev_bsdf_pdf > 0.f) * mis_bsdf,
                    result);
                this_result          = result - this_result;
                EmitterPtr emitptr  = si.emitter(scene);

                Color3f rgb;
                rgb = this_result;
                std::cout << typeid(emitptr).name() << std::endl;
                aov[0] = rgb.x();
                aov[1] = rgb.y();
                aov[2] = rgb.z();
                aov[3] = rgb.x();
                emitptr->put(pos, aov,emit_active);

            }

            // Continue tracing the path at this point?
            Bool active_next = (depth + 1 < m_max_depth) && si.is_valid();

            if (dr::none_or<false>(active_next))
                break; // early exit for scalar mode

            BSDFPtr bsdf = si.bsdf(ray);

            // ---------------------- Emitter sampling ----------------------

            // Perform emitter sampling?
            Mask active_em = active_next && has_flag(bsdf->flags(), BSDFFlags::Smooth);

            DirectionSample3f ds = dr::zeros<DirectionSample3f>();
            Spectrum em_weight = dr::zeros<Spectrum>();
            Vector3f wo = dr::zeros<Vector3f>();

            if (dr::any_or<true>(active_em)) {
                // Sample the emitter
                std::tie(ds, em_weight) = scene->sample_emitter_direction(
                    si, sampler->next_2d(), true, active_em);
                active_em &= dr::neq(ds.pdf, 0.f);

                /* Given the detached emitter sample, recompute its contribution
                   with AD to enable light source optimization. */
                if (dr::grad_enabled(si.p)) {
                    ds.d = dr::normalize(ds.p - si.p);
                    Spectrum em_val = scene->eval_emitter_direction(si, ds, active_em);
                    em_weight = dr::select(dr::neq(ds.pdf, 0), em_val / ds.pdf, 0);
                }

                wo = si.to_local(ds.d);
            }

            // ------ Evaluate BSDF * cos(theta) and sample direction -------

            Float sample_1 = sampler->next_1d();
            Point2f sample_2 = sampler->next_2d();

            auto [bsdf_val, bsdf_pdf, bsdf_sample, bsdf_weight]
                = bsdf->eval_pdf_sample(bsdf_ctx, si, wo, sample_1, sample_2);

            // --------------- Emitter sampling contribution ----------------

            if (dr::any_or<true>(active_em)) {
                bsdf_val = si.to_world_mueller(bsdf_val, -wo, si.wi);

                // Compute the MIS weight
                Float mis_em =
                    dr::select(ds.delta, 1.f, mis_weight(ds.pdf, bsdf_pdf));

                // Accumulate, being careful with polarization (see spec_fma)
                Spectrum this_result = result;
                result[active_em] = spec_fma(
                    throughput, bsdf_val * em_weight * mis_em, result);
                EmitterPtr emitptr = si.emitter(scene);
                this_result        = result - this_result;
                Color3f rgb;
                rgb = this_result;
                std::cout << typeid(emitptr).name() << std::endl;
                aov[0] = rgb.x();
                aov[1] = rgb.y();
                aov[2] = rgb.z();
                aov[3] = rgb.x();
                emitptr->put(pos, aov, active_em);
            }

            // ---------------------- BSDF sampling ----------------------

            bsdf_weight = si.to_world_mueller(bsdf_weight, -bsdf_sample.wo, si.wi);

            ray = si.spawn_ray(si.to_world(bsdf_sample.wo));

            /* When the path tracer is differentiated, we must be careful that
               the generated Monte Carlo samples are detached (i.e. don't track
               derivatives) to avoid bias resulting from the combination of moving
               samples and discontinuous visibility. We need to re-evaluate the
               BSDF differentiably with the detached sample in that case. */
            if (dr::grad_enabled(ray)) {
                ray = dr::detach<true>(ray);

                // Recompute 'wo' to propagate derivatives to cosine term
                Vector3f wo_2 = si.to_local(ray.d);
                auto [bsdf_val_2, bsdf_pdf_2] = bsdf->eval_pdf(bsdf_ctx, si, wo_2, active);
                bsdf_weight[bsdf_pdf_2 > 0.f] = bsdf_val_2 / dr::detach(bsdf_pdf_2);
            }

            // ------ Update loop variables based on current interaction ------

            throughput *= bsdf_weight;
            eta *= bsdf_sample.eta;
            valid_ray |= active && si.is_valid() &&
                         !has_flag(bsdf_sample.sampled_type, BSDFFlags::Null);

            // Information about the current vertex needed by the next iteration
            prev_si = si;
            prev_bsdf_pdf = bsdf_sample.pdf;
            prev_bsdf_delta = has_flag(bsdf_sample.sampled_type, BSDFFlags::Delta);

            // -------------------- Stopping criterion ---------------------

            dr::masked(depth, si.is_valid()) += 1;

            Float throughput_max = dr::max(unpolarized_spectrum(throughput));

            Float rr_prob = dr::minimum(throughput_max * dr::sqr(eta), .95f);
            Mask rr_active = depth >= m_rr_depth,
                 rr_continue = sampler->next_1d() < rr_prob;

            /* Differentiable variants of the renderer require the the russian
               roulette sampling weight to be detached to avoid bias. This is a
               no-op in non-differentiable variants. */
            throughput[rr_active] *= dr::rcp(dr::detach(rr_prob));

            active = active_next && (!rr_active || rr_continue) &&
                     dr::neq(throughput_max, 0.f);
        }
njroussel commented 1 year ago

I'm confused as to what you're trying to do. I'm not entirely sure this is even supported.

GensokyoLover commented 1 year ago

I'm trying to calculate the radiance of each Emitter to result image. So when Loop calculating Emitter sampling contribution, i called Emitter->put(pos,aov) to store the radiance in corresponding location of emitter's imageblock

GensokyoLover commented 1 year ago

and for this reason I init the emitter's imageblock in integrator.cpp and add a varible(const Vector2f& pos) in Integrator::sample method comparing to the old version

GensokyoLover commented 1 year ago

Hello,I further tested and found that the reason for this error is the atomic addition operation in ImageBlock. When I remove dr::scatter_reduce in ImageBlock::accum, the program can be compiled by drjit. So does mitsuba support atomic operations in Loop? Thank you for your time

GensokyoLover commented 1 year ago

Hello,I further tested and found that the reason for this error is the atomic addition operation in ImageBlock. When I remove dr::scatter_reduce in ImageBlock::accum, the program can be compiled by drjit. So does mitsuba support atomic operations in Loop? Thank you for your time

------------------ 原始邮件 ------------------ 发件人: "Nicolas @.>; 发送时间: 2023年4月3日(星期一) 晚上10:08 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [mitsuba-renderer/mitsuba3] Failed to parse input PTX string When EmitterPtr called my custom function in PathIntegrator::sample (Issue #633)

I'm confused as to what you're trying to do. I'm not entirely sure this is even supported.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

njroussel commented 1 year ago

It is supported. The issue here, I believe, is that the individual Imageblock objects of each emitter should be part of the loop's state variables. Since you need to read and write to them at every loop iteration (scatter_reduce operation) they should be state variables.

This is not an obvious fix, especially in your current setup where the emitters hold the ImageBlock objects. Then again, it shouldn't even be the ImageBlock but rather the underyling tensor (ImageBlock::tensor()) that should be passed to the loop. Obviously the number of these tensors dynamically varies with your scene, so you can't pass them all through the Loop constructor. You'll have to use Loop::put().

Honestly, I'm not quite sure this is all supported. You might just want to loop over all your emitters and render with only one of them "turned on" at each iteration.

GensokyoLover commented 1 year ago

I tried and still failed. First time,I tried to put ImageBlock::tensor() but received a error--"tensor dont have index_ptr". So i turned to put ImageBlock::tensor()::data() and this time mitsuba reported "Caught a critical exception: jit_var_loop_init(): loop state variables have an inconsistent size (480000 vs 15360000)!" in jitc_var_loop_init fuction. Finally i tried to remove the jitc_raise method in that function and received another jit_raise "Caught a critical exception: jit_var_scatter(): cannot scatter to a placeholder variable!". Your Advice is pretty good to achieve the single light film,but our scenes has a lot of light sources. So we are trying to render all of the lights at once.

GensokyoLover commented 1 year ago

The jit_raise about inconsistent size seems only allow the data structure whose size is equal to the number of total samples.

njroussel commented 1 year ago

Oh, I hadn't thought about the size conflict initially, sorry.

I think this is not something we can support currently in recorded loops, it breaks the parallelism model. You can use a normal cpp loop instead of the DrJit loop in the sample method. This should work, albeit with a performance cost.