vislearn / dsacstar

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)
BSD 3-Clause "New" or "Revised" License
243 stars 36 forks source link

data preparation on my data #25

Closed MisEty closed 10 months ago

MisEty commented 1 year ago

Hello, Thanks for your previous reply but I still can't run dsac on my data. I have checked and thought there might be a problem with my data preparation phase:

  1. I want to run RGB-D relocalization on kinect v2 data. According to the introduction of the data structure, I need to set rgb, calibration, poses and eyes. I have camera focal length but I noticed that the input imges will be resized to 480px height and my detph images have 424px height. How should I set the camera focal length? Whether to use the original focal length or should be modified to the corresponding focal length after resize?
  2. Similarly, eye files should be the three-dimensional coordinates obtained by projecting the depth map according to the camera intrinsic matrix. How to deal with this different image resolution situation?

Update 4.19: I edit my data as following: 1. resize the images to 480 height 2. change the camera intrinstic and compute eye files. The dsacstar works well on my simple test: I only use 50 reference images to train and use one of them to test. However, When I want to train on the total reference images, the e2e networks cannot converge. Best Regards, Peng

monamourvert commented 1 year ago

Hi, could you explain more about the e2e networks cannot converge? I also do the DSAC* with my own dataset. But I just realised the first stage (train_init) loss is not that good (The loss between 8~20's). And yeah when I move it into the e2e train stage, it becomes worse.

I wonder what's wrong with my dataset. Is it because the focal length problem or the pose problem? Still have no idea

MisEty commented 1 year ago

Hi, could you explain more about the e2e networks cannot converge? I also do the DSAC* with my own dataset. But I just realised the first stage (train_init) loss is not that good (The loss between 8~20's). And yeah when I move it into the e2e train stage, it becomes worse.

I wonder what's wrong with my dataset. Is it because the focal length problem or the pose problem? Still have no idea

The results I got is similar with yours. I think your can do a simply test that only use part of the reference to train init and e2e and check the result. In my datasets this result is not bad

monamourvert commented 1 year ago

Would you like to enlighten me about your result? Does your result produce a good accuracy? I mean the translation error of my dataset is too big (almost 14m)

MisEty commented 1 year ago

In my simple test I used 50 reference images to train and use the same images as query, the trans/rot error is about 5cm/5deg. When I use all reference images(about 5000), the trans error is about 20cm

ebrach commented 1 year ago

Hard to comment without knowing more about your datasets. But @MisEty, your results sound reasonable. 20cm sounds like the system is working in general. Is the scene quite large? That could explain the larger positional error.

@monamourvert: I would suggest to test the model after init and before e2e training, on the training set (i.e. using training images as test images). If you see large errors, then the initialisation was not good enough to bring the network to a place where end-to-end training can take over. End-to-end training can be seen as a refinement of an already OK network to max accuracy.

There are multiple potential reasons why the training fails. The scene could be too large, too difficult (e.g. because of repeated structures) or the dataset convention is not right.

A friendly pointer to ACE, the successor of DSAC*. ACE can help in two ways:

Best, Eric