tub-rip / event_based_optical_flow

The official implementation of "Secrets of Event-based Optical Flow" (ECCV2022 Oral and IEEE T-PAMI 2024)
GNU General Public License v3.0
138 stars 13 forks source link

Some questions in the paper #5

Closed yuyangpoi closed 1 year ago

yuyangpoi commented 1 year ago

Very nice work. I have a question about section 3.3 of the article, may I ask how equation (7) is obtained?

shiba24 commented 1 year ago

Thank you for the interest. The idea behind the equation is written in the paper:

Just like the brightness constancy assumption states that brightness is constant along the true motion curves in image space, we assume the flow is constant along its streamlines ... Differentiating in time and applying the chain rule gives a system of partial differential equations (PDEs)

Otherwise, can you elaborate the question a lil bit more?

DachunKai commented 1 year ago

Thank you for the interest. The idea behind the equation is written in the paper:

Just like the brightness constancy assumption states that brightness is constant along the true motion curves in image space, we assume the flow is constant along its streamlines ... Differentiating in time and applying the chain rule gives a system of partial differential equations (PDEs)

Otherwise, can you elaborate the question a lil bit more?

I also have the question. On the top of page 7, paper says "v = dx/dt is the flow." But in the previous part, it states v(x(t), t) is the flow and it's a constant. So can we get "dx/dt = v(x(t), t)". When insert it to equation(7), it's so confusing.

shiba24 commented 1 year ago

@DachunKai hi, thank you for your interest too. What exactly is your question, if I may ask?

DachunKai commented 1 year ago

@shiba24 Hi, I'm interested in what is the difference between rgb-based optical flow (as you know such as pwc-net or RAFT) and event-based optical flow. If it's given N_{e} events, what is the format of event-based flow, such as shape of the tensor? Can you explain it? Thanks

shiba24 commented 1 year ago

@DachunKai This part returns the space-time optical flow in the paper: https://github.com/tub-rip/event_based_optical_flow/blob/main/src/solver/patch_contrast_pyramid.py#L494-L515

The shape is a voxel (n_bin, 2, H, W). n_bin is the number of time bins to quantize time.

Each event has its own timestamp (t_k). So we want to use the flow value at v(x, t) (at that timestamp), not just v(x) (always use the flow at, let's say, t_0). The equation (7) gives you the condition that such flow v(x, t) should satisfy. But it is not easy to solve equation (7) analytically - we don't know how to handle (almost) continuous timestamps of events (which requires analytical solution of v(x, t)).

So we need some numerical approach, such as upwind scheme or Burgers' equation. It quantizes the time [t0, t{N_e}] into n bins. That's the shape of the flow. Every event (that has almost continuous timestamps) is assigned to the corresponding bin of the quantized time, then is warped using that flow value. The warp implementation is here. https://github.com/tub-rip/event_based_optical_flow/blob/main/src/warp.py#L315

Feel free to ask, if you have further questions.

DachunKai commented 1 year ago

@shiba24 Thanks for your detailed instructions! I can understand your space-time optical flow v(x, t) now!

In conclusion, you firstly convert N_e events to n_bin voxel, and voxel shape is (n_bin, H, W). Such as rpg_e2vid, as you know we usually use voxel to represent events in deep learning. But in your paper, sec 3.4 paragraph 2, about multi-scale approach, you said your tile-based approach works directly on raw events. I'm confused about the true representation you used.

To estimate optical flow, CM framework will get image of warped events(IWE), and they assumed that same linear velocity of all events. So your assumption is each voxel bin has its own velocity, am I right? And where is your code to get IWE, can you direct me to it.

Where do you think such event voxel optical flow can be applied? As far as i konw, it's designed to get IWE, i.e. aligned events for downstream tasks.

shiba24 commented 1 year ago

@DachunKai

In conclusion, you firstly convert N_e events to n_bin voxel, and voxel shape is (n_bin, H, W).

No, that is NOT correct. Can you take a look at the code that I specified (the warp part) by yourself? Events are events, the warp is always on the raw timestamps (not the quantized timestamps). No conversion of events into voxel - we don't need it. And the tile-based approach works on raw events.

To estimate optical flow, CM framework will get image of warped events(IWE), and they assumed that same linear velocity of all events. So your assumption is each voxel bin has its own velocity, am I right?

No, that is wrong.

shiba24 commented 1 year ago

Don't confuse the shape of events and the shape of optical flow, time-quantization is only for the optical flow not for events. Also, we estimate optical flow, which is different from "same linear velocity of all events". CM framework itself does not assume it. Maybe you can read some other CMax-based papers. For example, https://www.mdpi.com/1424-8220/22/14/5190, https://ieeexplore.ieee.org/document/7805257.

shiba24 commented 1 year ago

@yuyangpoi is your original question resolved? If so, I'd like to close this issue.

yuyangpoi commented 1 year ago

@yuyangpoi is your original question resolved? If so, I'd like to close this issue.

sure, thanks.