Scaling with monocular slam - camera/map points positions to real world positions

Hi, I want to get the real world position of camera and map points, in real world meters, compared to the initial location of the camera. I'm currently using monocular sensor, and I know it's impossible to retrieve the real world scale as it is arbitrary initialized. I do plan at some point to add IMU, so I can use the monocular-IMU and get the accurate position. But at this point, for testing purposes, I want to use scene information to transform the slam world position into actual world position relative to the camera. I was thinking of two ways to get information from the scene to achieve this, but TBH, I'm not sure that the way I'm thinking of it is correct.

moving the camera along a certain axe {n} meters, and then, dividing that {n} by the slam world position distance moved along that axe. I.E,l lets say I've moved 60 cm along the x axe in real world, and in slam world the position changed from 0.000 to 0.150, then, since I want to work in real world meters, i should divide 60 / 15 and get a scale factor of 4.
Or use real world known size object, measure the distance between it's borders map points positions, divide the known size by the measured size Both ways should eventually give me the scale factor between the actual scale in real meters, and the scale chosen for the map. am I looking at this correctly? should the ratio between slam-world measured positions and real-world positions be of a constant scale? or should I find some transformation matrix describing that transformation? Any help would be much appreciated.

raulmur / ORB_SLAM2

Scaling with monocular slam - camera/map points positions to real world positions #942