naver / dust3r

DUSt3R: Geometric 3D Vision Made Easy
https://dust3r.europe.naverlabs.com/
Other
4.65k stars 515 forks source link

Try this to increase resolution w/o finetuning (Instruction) #62

Open KyunHwan opened 3 months ago

KyunHwan commented 3 months ago

Using the default setup, large input images were being resized to 512 x 384 (using DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth). But I wanted results with higher resolution (1024 x 768). So I followed "Extending Context Window of Large Language Models via Position Interpolation" by Meta and changed only the default image_size value of 512 to 1024 inside demo.py and multiplied the variable t inside get_cos_sin method of RoPE2D of croco/models/pos_embed.py by (512/1024). This gave pretty good results, though finetuning is most likely required for better results.

KyunHwan commented 3 months ago

Another resolution (1536 x 1152) was tested with multiplication factor of (512/1536) for t with appropriate results. So far this works well for objects that have "good number" of features.

hdzys commented 3 months ago

How to modify the parameter t?

def get_cos_sin(self, D, seq_len, device, dtype): if (D,seq_len,device,dtype) not in self.cache: inv_freq = 1.0 / (self.base ** (torch.arange(0, D, 2).float().to(device) / D)) t = torch.arange(seq_len, device=device, dtype=inv_freq.dtype) freqs = torch.einsum("i,j->ij", t, inv_freq).to(dtype) freqs = torch.cat((freqs, freqs), dim=-1) cos = freqs.cos() # (Seq, Dim) sin = freqs.sin() self.cache[D,seq_len,device,dtype] = (cos,sin) return self.cache[D,seq_len,device,dtype]

KyunHwan commented 3 months ago

How to modify the parameter t?

def get_cos_sin(self, D, seq_len, device, dtype): if (D,seq_len,device,dtype) not in self.cache: inv_freq = 1.0 / (self.base ** (torch.arange(0, D, 2).float().to(device) / D)) t = torch.arange(seq_len, device=device, dtype=inv_freq.dtype) freqs = torch.einsum("i,j->ij", t, inv_freq).to(dtype) freqs = torch.cat((freqs, freqs), dim=-1) cos = freqs.cos() # (Seq, Dim) sin = freqs.sin() self.cache[D,seq_len,device,dtype] = (cos,sin) return self.cache[D,seq_len,device,dtype]

if you're going from 512 to 1024, you would do: t = torch.arange(seq_len, device=device, dtype=inv_freq.dtype) * (512/1024)

hdzys commented 3 months ago

@KyunHwan thanks