utiasSTARS / pykitti

Python tools for working with KITTI data.
MIT License
1.15k stars 239 forks source link

Random access to the image data #23

Closed samarth-robo closed 6 years ago

samarth-robo commented 6 years ago

Since the data streams in the Odometry class are all generators, is it true that that makes random access slow? For example, if I want to get the 50th pointcloud, and do next(iter(itertools.islice(odometry.velo, 49, None))) like in the example, it will read the first 49 point clouds and then yield the 50th point cloud.

leeclemnet commented 6 years ago

I believe itertools.islice is smart enough to not actually read the data from the skipped parts of the iterator. A quick timeit comparison shows that accessing an islice at image 50 is about 40x faster than iterating through the generator 50 times:

In [14]: timeit.timeit('for _ in data.cam0: pass', setup = 'import pykitti; data
    ...: =pykitti.raw("KITTI/raw","2011_09_30","0034",
    ...: frames=range(50))', number=1)
Out[14]: 2.0967910299950745

In [15]: timeit.timeit('v = next(itertools.islice(data.cam0, 50))', setup='impor
    ...: t pykitti; data=pykitti.raw("KITTI/raw","2011
    ...: _09_30","0034")', number=1)
Out[15]: 0.05846199599909596

Another option is to pass a range argument to pykitti.raw or pykitti.odometry so the generator only points to the subset of the data you want.

samarth-robo commented 6 years ago

You are right. I don't know why islice was slowing down for me before. Thanks!

samarth-robo commented 6 years ago

Sorry to be a little pedantic, but your timeit command above will actually generate the first image even though you give it only 50 as the argument (it is treated as the stop index). Simple example:

In[44]: g = (i for i in range(100))
In[45]: next(islice(g, 5))
Out[45]: 0
In[46]: next(islice(g, 5))
Out[46]: 1
In[47]: next(islice(g, 5))
Out[47]: 2
In[48]: next(islice(g, 5))
Out[48]: 3
In[49]: next(islice(g, 5))
Out[49]: 4
In[50]: next(islice(g, 5))
Out[50]: 5

To get the 50th image in the fist shot, you need next(islice(gen, 50, 51)):

In[54]: g = (i for i in range(100))
In[55]: next(islice(g, 5, 6))
Out[55]: 5

The good news is that your timeit code gives similar results if I give it the correct start and stop indices. So it is 'skipping' the first 50 images. But ultimately, this approach does not work for me because even though it 'skips' to the 50th image, it seems to somehow consume the earlier images, so that they are not accessible anymore:

In[56]: g = (i for i in range(100))
In[57]: next(islice(g, 50, 51))
Out[57]: 50
In[58]: sum(1 for _ in g)
Out[58]: 49

The last line is just a trick to calculate the remaining length of the generator. So this does not work for me if I am using this dataset in a deep learning framework where I need random access to the data in the dataset.

leeclemnet commented 6 years ago

Ah, I should have expected that you were trying to do something with deep learning. You're right that the generator doesn't rewind after you skip ahead to the 50th element, although a new generator is created every time you access data.rgb etc.

For proper random access, the easiest thing is just to load the list of files and then access them by index (which is what pytorch DataLoaders expect anyway). I implemented a pykitti-esque solution for a couple other datasets here that uses PIL to load images and supports pytorch. I haven't gotten around to updating pykitti to support this yet, but feel free to adapt my other code to your needs.

samarth-robo commented 6 years ago

Yes I ended up using the list of files in my data loader. Thanks for this repository, and I'll check the other one out!

leeclemnet commented 6 years ago

FYI, I just pushed an update that adds support for proper random access to all the sensor data. The generators are still available for sequential access as well. https://github.com/utiasSTARS/pykitti/commit/19d29b665ac4787a10306bbbbf8831181b38eb38

I've also updated pykitti to version 0.3.0 on pypi, so you can update with pip if you like.