Auto calibrate from iPhone

HalfdanJ commented 9 years ago

What is needed?

[ ] Automatic 3d point cloud construction from video
[ ] BMC Encoding of graycode patterns for producing steady images under the scan that can be for matching frames positions

...

kylemcdonald commented 9 years ago

standard manchester encoding is also an option instead of bmc. especially because "bright" and "dark" are well defined our situation, we can make decoding a little easier on ourselves if the polarity is not in question.

overall, the process looks like this:

deform the image based on tracked keypoints. (tracking and deformation can be done on GPU)
look at each pixel over time to decode it (could possibly be done on GPU, hard to say)
collect all decoded pixels from all images and run bundle adjustment. e.g., http://www.uco.es/investiga/grupos/ava/node/39 must be done on CPU.
construct xyzMap directly from projected point 3d estimation
additional setting for establishing orientation and center of the space (orientation could come from phone, then center can be derived)

for the bundle adjustment, some tools let you set the camera parameters as fixed for all images. we'd want that.

there needs to be a line between on-phone and on-computer processing:

do everything on the phone (probably impossible).
do image warping and pixel decoding on the phone and send keypoints back to computer.
send thresholded video back to computer (lower bandwidth)
send raw video to computer (higher bandwidth)

the biggest problem in my opinion is the capturing / decoding process. one weird but minimum viable option would be using FM with the a and b components of Lab color. so the oscillation frequency of a would indicate the horizontal position and the b would indicate the vertical position. i've done this kind of FM demodulation before for 3d scanning and it generally works. https://www.flickr.com/photos/kylemcdonald/5659471322/in/album-72157613657773217/

for N pixels of resolution you would need N frames to get a full cycle, e.g. 1920 frames for a single 1920x1080 projector, (one minute of video at 30 fps). but you don't need to record a full cycle to estimate the frequency, you just gain accuracy the longer your recording is.

for the graycode scan, each axis has 10-12 images depending on the resolution. if we are using differential graycode, this is doubled, but i'm not convinced that will be the bottleneck here. so let's say up to 25 images for a full scan. each bit is then encoded as two for manchester encoding, making it 50 frames, or 2-3 seconds per scan at 30 fps. then we need to add a marker that indicates the start/end of the code. this could be a single flash of color, a sequence of colors, or some full-screen stroboscopic projection. this is not something that needs to be calibrated on a per-pixel basis, but it happens across the whole image.

alternatively, for sync we could use the network connection. latency and jitter should be around one frame of video.

edit: thinking about this more. there is a weird asymmetry in the frequency-based technique in that the lower frequency signals will be harder to estimate accurately. a continuous technique is appealing because it takes advantage of the high bit depth of the sensor, but a binary/boolean representation is appealing because it can operate at many brightness levels.

also, bundle adjustment is a secondary step.. there is a multiview reconstruction that has to happen before.

kylemcdonald commented 6 years ago

some more notes/ideas... there's also this library that allows for camera extrinsics to be estimated using markers https://sourceforge.net/projects/markermapper/ based on aruco http://www.uco.es/investiga/grupos/ava/node/26

it would be possible in theory to cover the room in these markers, and then we would be able to estimate intrinsics for all the camera perspectives, which would remove the camamok step.

also, since we started thinking about this, markerless AR has matured and many devices can estimate some kind of position and orientation over time even in low light.

here's some SLAM code that's recent https://github.com/raulmur/ORB_SLAM2 https://www.youtube.com/watch?v=GDgRBcZsBNI

a big question: what kind of camera can handle the low-light requirements of light leaks? we know that a DSLR can take still photos with low noise, but when we push the exposure time to 1/30th second or less, how many cameras can still cope?

another big question: generally we take photos from across the room, but if this approach were to work we'd probably have to walk along the walls, potentially casting shadows. our shadows would probably be our biggest enemy.

kylemcdonald / LightLeaks

Auto calibrate from iPhone #6