Closed Enderych closed 4 years ago
Or maybe I could ask this question in another way. Is it OK if I extract the pose of each db image from _aachen_cvpr2018db.nvm directly? It looks like that: NVM_V3
4328 db/1045.jpg 1084.59000000 -0.030550400000 -0.212047000000 0.120580000000 0.969311000000 738.038000000000 -6.174860000000 -41.837100000000 0.081098900000 0 ......
I am not sure I understand your question. What are you trying to achieve?
I want to extract the pose of every db image, so I can evaluate the geometry relationship between them.
That's what I have done: I extract the file name (db/1045.jpg in last comment example), quaternion WXYZ (-0.030550400000 -0.212047000000 0.120580000000 0.969311000000) and position XYZ (-6.174860000000 -41.837100000000 0.081098900000) directly from _aachen_cvpr2018db.nvm for every db image (4328 in total). Then I found that some images, even they are geometrically close ( short WXYZ & XYZ distance), their appearances are very different, that's why I think they are not captured in the same location. (such as image pairs db/1045.jpg & db/2506.jpg, or db/1135.jpg & db/3355.jpg)
Here is my assumption: several small models have been built from different locations. The poses of images have been normalized (referenced) to every submodel's reference point, so their XYZ value will not be too large. So images 1 and 2 might come from submodel A and B, separately, while images 1 and 2 could have very close distance based on their pose value, but in fact, they are referenced to different reference points. That's the reason why images 1 and 2 may have different appearances.
My suggestion: add a random translation for every submodel, in order to avoid the overlap between images from different submodels.
There seems to be a problem in the way that you are reading out the positions. For db/1045.jpg, the position of the camera center in model coordinates is (738.038, -6.17486, -41.8371), as stored in the .nvm file. Maybe you are confusing position and translation?
The image pairs that you list are taken roughly 50 meters apart.
As described in the paper, we constructed a single reference model. No submodels were build and then aligned.
Actually, looking at your XYZ coordinates, it seems that you are reading in the position incorrectly. The last entry (0.081098900000) is the radial distortion term, not the Z coordinate. You are missing the X coordinate and just storing Y, Z, radial distortion.
OK, I see, I thought the camera center means the horizontal pixel center of each image, then later three as XYZ. Now I see I was wrong, thank you.
two more questions:
Could you please also offer day time query-db image pairs just like you offered for night time, every query image with 20 candidates from db?
when I upload the result to your benchmark, two .txt files (day and night separately) with a format like that: filename_without_path W A B C X Y Z IMG_20161227_173116.jpg 0.0702079 0.835457 -0.0640913 -0.541271 -329.332 127.496 654.254 ...... is that right?
Regarding your question:
name.jpg qw qx qy qz tx ty tz
.name
corresponds to the filename of the image, without any directory names.
qw qx qy qz
represents the rotation from world to camera coordinates as a
unit quaternion. tx ty tz
is the camera translation (not the camera position).Does this answer your question?
I think I understand most of your answer, thank you. While here: 'tx ty tz is the camera translation (not the camera position).' I am confused, I think tx ty tz is the camera translation from world to camera coordinates, while also the camera position in the world coordinates, is that right? Or do you mean T as the translation from world to camera coordinates, t means the camera position in the world coordinates, R means the rotation from world to camera coordinates, then: T = Rt?
But I still think T = t, since both of them are in the world coordinate.
This if from the readme file that comes with the Aachen dataset:
The different types of models store poses in different formats.
R, c
and the camera translation can be computed as
t = -(R * c)
..info
file store the rotation (as a 3x3 matrix) from the world
coordinate system to the camera coordinate system as well as the camera translation. Thus,
they store a pose as [R|t]
.We strongly recommend that you familiarize yourself with the file format of the models that you plan to use.
Does this answer your question?
Yes, very clear. Thank you.
So I will use the ### R, c from NVM to train my model, and predict the ### R, c of every query image, and compute the translation by ### t = -(R * c), then upload the final result in a .txt file.
Next time I will search my questions in the readme file of dataset first to make it easier for understanding, thanks again.
Hello, I did some test to calculate the intersection between two camera's frustum, in order to define their 'spatial similarity'. Then I found that in the db dataset (4328 images from _aachen_cvpr2018db.nvm), some images have very close pose, but in a different location (after appearance check). For example: db/1045.jpg & db/2506.jpg, db/1135.jpg & db/3355.jpg ...... I am thinking that maybe there are several subsets (sub-models) of images in this dataset, and some of the db images in different subsets have overlapped absolute pose, is that right? So if we predict the absolute pose for one query image, it could be referenced to many different subsets? Thank you in advance.