Closed samarth-robo closed 6 years ago
I believe itertools.islice
is smart enough to not actually read the data from the skipped parts of the iterator. A quick timeit
comparison shows that accessing an islice
at image 50 is about 40x faster than iterating through the generator 50 times:
In [14]: timeit.timeit('for _ in data.cam0: pass', setup = 'import pykitti; data
...: =pykitti.raw("KITTI/raw","2011_09_30","0034",
...: frames=range(50))', number=1)
Out[14]: 2.0967910299950745
In [15]: timeit.timeit('v = next(itertools.islice(data.cam0, 50))', setup='impor
...: t pykitti; data=pykitti.raw("KITTI/raw","2011
...: _09_30","0034")', number=1)
Out[15]: 0.05846199599909596
Another option is to pass a range
argument to pykitti.raw
or pykitti.odometry
so the generator only points to the subset of the data you want.
You are right. I don't know why islice
was slowing down for me before. Thanks!
Sorry to be a little pedantic, but your timeit command above will actually generate the first image even though you give it only 50 as the argument (it is treated as the stop index). Simple example:
In[44]: g = (i for i in range(100))
In[45]: next(islice(g, 5))
Out[45]: 0
In[46]: next(islice(g, 5))
Out[46]: 1
In[47]: next(islice(g, 5))
Out[47]: 2
In[48]: next(islice(g, 5))
Out[48]: 3
In[49]: next(islice(g, 5))
Out[49]: 4
In[50]: next(islice(g, 5))
Out[50]: 5
To get the 50th image in the fist shot, you need next(islice(gen, 50, 51))
:
In[54]: g = (i for i in range(100))
In[55]: next(islice(g, 5, 6))
Out[55]: 5
The good news is that your timeit code gives similar results if I give it the correct start and stop indices. So it is 'skipping' the first 50 images. But ultimately, this approach does not work for me because even though it 'skips' to the 50th image, it seems to somehow consume the earlier images, so that they are not accessible anymore:
In[56]: g = (i for i in range(100))
In[57]: next(islice(g, 50, 51))
Out[57]: 50
In[58]: sum(1 for _ in g)
Out[58]: 49
The last line is just a trick to calculate the remaining length of the generator. So this does not work for me if I am using this dataset in a deep learning framework where I need random access to the data in the dataset.
Ah, I should have expected that you were trying to do something with deep learning. You're right that the generator doesn't rewind after you skip ahead to the 50th element, although a new generator is created every time you access data.rgb
etc.
For proper random access, the easiest thing is just to load the list of files and then access them by index (which is what pytorch DataLoaders expect anyway). I implemented a pykitti-esque solution for a couple other datasets here that uses PIL to load images and supports pytorch. I haven't gotten around to updating pykitti to support this yet, but feel free to adapt my other code to your needs.
Yes I ended up using the list of files in my data loader. Thanks for this repository, and I'll check the other one out!
FYI, I just pushed an update that adds support for proper random access to all the sensor data. The generators are still available for sequential access as well. https://github.com/utiasSTARS/pykitti/commit/19d29b665ac4787a10306bbbbf8831181b38eb38
I've also updated pykitti to version 0.3.0 on pypi, so you can update with pip if you like.
Since the data streams in the Odometry class are all generators, is it true that that makes random access slow? For example, if I want to get the 50th pointcloud, and do
next(iter(itertools.islice(odometry.velo, 49, None)))
like in the example, it will read the first 49 point clouds and then yield the 50th point cloud.