Closed Net-Maker closed 1 year ago
Great question! To answer that, let us take a simpler example. Let us say we are given several pairs of points $(x_k, y_k)$. If we were asked to find a continuous function that relates $y_k$ to $xk$, we would perform polynomial fitting of the form, $y = {\sum{q=0}}^Q a_q x^q$. Here, the learnable parameters are $\theta={a_0, a_1, \cdots, aQ}$. However, we can also learn a neural network representation, $y = \mathcal{N}\theta(x)$, where $\mathcal{N}$ is a multilayer perceptron and $\theta$ are its parameters.
In a similar manner, images can also have a continuous function to represent them. In such a case, we have, $i_{(x,y)}=\mathcal{N}_\theta(x,y)
$, where $i_{(x,y)}$ is the intensity at the coordinate $(x, y)$. this is an implicit representation with a neural network, and hence the name, ``implicit neural representation (INR)".
If the INR has sufficient capacity, then the image is represented very accurately. MLPs equipped with the traditional nonlinearity such as ReLU however does not have sufficient capacity (think of it as similar to a low-order polynomial). In contrast, nonlinearities such as SIREN (sinusoidal) perform well, but do not scale well to very large signals (think gigapixel images). This is where MINER comes into the picture, by performing a partition of the space into disjoint blocks, and using small MLPs for each block -- somewhat similar to splines instead of a single polynomial. Hence, the capacity of a network architecture can be evaluated well by fitting an image.
Hope that helps!
Dea vishwa,
Thank you so much for your detailed response! Your explanation about implicit neural representations (INR) and the concept of fitting an image using MINER was incredibly helpful. I truly appreciate the time and effort you put into explaining the topic.
Your analogy with polynomial fitting and the introduction of a multilayer perceptron (MLP) to represent continuous functions provided me with a clearer understanding of the concept. Additionally, your explanation about the limitations of traditional nonlinearities like ReLU and the advantages of alternatives like SIREN was enlightening.
The idea of partitioning the space into disjoint blocks and using small MLPs for each block, akin to splines, in order to handle very large signals is fascinating. It's remarkable how MINER expands the capacity of network architectures and allows for accurate image representation.
Once again, thank you for sharing your knowledge and expertise. Your response has been instrumental in enhancing my understanding of the topic. I'm grateful for your generous help.
Best regards netmaker
Sorry to bother you. May I ask where to download the gigapixel image?
Sorry to bother you. May I ask where to download the gigapixel image?
Sorry again for the stupid question!
Hello, Thanks for yout excllent work! I am an undergraduate student.And i am learning INRs(Implicit Neural Representations) now. I found image fitting task in the experiment part in a lot of INR papers.So i search this task in google but got nothing. could you explain the meaning of this part experiment?It would help me a lot. I wonder is it a way to show the superiority of the INR?Or it's a challenge of INR?Why set this experiment? Best wishes. Looking forward to your reply.