DUSt3R: Geometric 3D Vision Made Easy
RuntimeError: tokens must have 4 dimensions #150

Open CrispyFeSo4 opened 1 month ago

CrispyFeSo4 commented 1 month ago

File "/home/dust3r/croco/models/curope/", line 22, in forward _kernels.rope_2d( tokens, positions, base, F0 ) RuntimeError: tokens must have 4 dimensions

I encountered this error, can anyone tell me how to fix it? thanks!

richiebailey74 commented 1 month ago

I am also getting this error. To give more context, I am running this using the file and am connecting to a T4 GPU using Colab. Would love any insight into how to get around this error.

yocabon commented 1 month ago

Hi, under what circumstances does it crash: code, is curope compiled with the same cuda version as pytorch ?, does it work if you remove curope (it's optional) ?

dream-in-night commented 3 weeks ago

重新编译一下就行了, 进入 croco/models/curope 下,

python build_ext --inplace


  Copyright (C) 2022-present Naver Corporation. All rights reserved.
  Licensed under CC BY-NC-SA 4.0 (non-commercial use only).

#include <torch/extension.h>

using namespace std;
// forward declaration
void rope_2d_cuda( torch::Tensor tokens, const torch::Tensor pos, const float base, const float fwd );

void rope_2d_cpu( torch::Tensor tokens, const torch::Tensor positions, const float base, const float fwd )
    const int B = tokens.size(0);
    const int N = tokens.size(1);
    const int H = tokens.size(2);
    const int D = tokens.size(3) / 4;

    auto tok = tokens.accessor<float, 4>();
    auto pos = positions.accessor<int64_t, 3>();

    for (int b = 0; b < B; b++) {
      for (int x = 0; x < 2; x++) { // y and then x (2d)
        for (int n = 0; n < N; n++) {

            // grab the token position
            const int p = pos[b][n][x];

            for (int h = 0; h < H; h++) {
                for (int d = 0; d < D; d++) {
                    // grab the two values
                    float u = tok[b][n][h][d+0+x*2*D];
                    float v = tok[b][n][h][d+D+x*2*D];

                    // grab the cos,sin
                    const float inv_freq = fwd * p / powf(base, d/float(D));
                    float c = cosf(inv_freq);
                    float s = sinf(inv_freq);

                    // write the result
                    tok[b][n][h][d+0+x*2*D] = u*c - v*s;
                    tok[b][n][h][d+D+x*2*D] = v*c + u*s;

void rope_2d( torch::Tensor tokens,     // B,N,H,D
        const torch::Tensor positions,  // B,N,2
        const float base, 
        const float fwd )
    // std::cout << "rope_2d: " << tokens.dim() << std::endl; // 输出 rope_2d: 4
    TORCH_CHECK(tokens.dim() == 4, "tokens must have 4 dimensions");
    TORCH_CHECK(positions.dim() == 3, "positions must have 3 dimensions");
    TORCH_CHECK(tokens.size(0) == positions.size(0), "batch size differs between tokens & positions");
    TORCH_CHECK(tokens.size(1) == positions.size(1), "seq_length differs between tokens & positions");
    TORCH_CHECK(positions.size(2) == 2, "positions.shape[2] must be equal to 2");
    TORCH_CHECK(tokens.is_cuda() == positions.is_cuda(), "tokens and positions are not on the same device" );

    if (tokens.is_cuda())
        rope_2d_cuda( tokens, positions, base, fwd );
        rope_2d_cpu( tokens, positions, base, fwd );

  m.def("rope_2d", &rope_2d, "RoPE 2d forward/backward");

这里其实输出是4,维度是4,但是报错了。后来我加上cout重新编译后,竟然通过了。 编译后可以测试一下:

import torch
import curope as _kernels

# print(f'tokens.shape, positions.shape: {tokens.shape, positions.shape}') # (torch.Size([32, 196, 16, 64]), torch.Size([32, 196, 2]))
tokens = torch.randn(32, 196, 16, 64)
positions = torch.randn(32, 196, 2).long()
base = 100.0
F0 = 1.0
_kernels.rope_2d( tokens.cuda(), positions.cuda(), base, F0 )