microsoft / NUWA

A unified 3D Transformer Pipeline for visual synthesis
2.81k stars 162 forks source link

Question about paper #8

Open diaodeyi opened 2 years ago

diaodeyi commented 2 years ago

As the paper says in appendix: "For example, for long videos or high-resolution frames with large h, w, s, usually (e^h)(e^w)(e^s)< (h + w + s)" Is there any situation that (e^h)(e^w)(e*s)< (h + w + s)?

diaodeyi commented 2 years ago

By the way, i want to konw why does the equation 7 in paper use e^x not just use x for the difinition? in this way the complexity of 3DNA is (hws)(h+w+s)

截屏2022-04-13 上午10 43 01
diaodeyi commented 2 years ago

这里的HWS是范围吧,那e的HWS次方不是早就超出attention的范围了?

woctezuma commented 2 years ago

Table

Picture

Indeed, this is a weird statement. By rewriting the product of exponential factors as a single exponential of the variable x=h+w+s:

(e^h)(e^w)(e^s) = e^(h+w+s)

We are on a classical inequality between x and e^x, and it is easy to prove or verify that x < e^x for every x.

diaodeyi commented 2 years ago

yeah,so how it would be x > e^x. (h+w+s)> e^h e^w e^s. like the paper said.