ouster-lidar / ouster_example

Ouster, Inc. sample code
Other
451 stars 434 forks source link

Iterating over a Scans object for a PCAP suddenly becomes extremely slow, and processing subsequent PCAPs remains slow #496

Closed airalcorn2 closed 1 year ago

airalcorn2 commented 1 year ago

I have a set of PCAP files that I'm trying to process. While iterating over the Scans objects for the first few files is very fast, shortly after that, iterating over a Scans object suddenly becomes extremely slow. It often happens in the middle of iterating over the Scans object for a file, and then processing all subsequent PCAPs is similarly slow. All of the PCAPs are of a similar size. Further, if I process a PCAP that was slow in the script separately on its own, it's fast. The timings here were produced by the script below.

import os
import sys
import time

from ouster import client, pcap

EVAL_EVERY = 50

def main():
    scans_dir = sys.argv[1]
    scan_names = [f.split(".")[0] for f in os.listdir(scans_dir) if f.endswith(".pcap")]
    for scan_name in scan_names:
        print(f"\n{scan_name}\n")
        with open(f"{scans_dir}/{scan_name}.json") as f:
            metadata = client.SensorInfo(f.read())

        source = pcap.Pcap(f"{scans_dir}/{scan_name}.pcap", metadata)
        # To avoid ClientTimeout error.
        scans = client.Scans(source, timeout=None)
        start_time = time.time()
        for idx, scan in enumerate(scans):
            if (idx + 1) % EVAL_EVERY == 0:
                print(f"{idx}: {time.time() - start_time:.2f} s", flush=True)
                start_time = time.time()

if __name__ == "__main__":
    main()
airalcorn2 commented 1 year ago

The issue seems to be that I was reading the data from an external hard drive. The first time a file is read from the external hard drive is extremely slow. However, the next time time the file is read (as long as the external hard drive hasn't been unplugged), the file is processed quickly as expected, so the file seems to be cached somehow. I didn't realize this was the case because I had been playing with the data for a while without ever unplugging the hard drive. I'm going to leave this open for the time being and verify that moving all the data locally results in all of the files being processed quickly.

twslankard commented 1 year ago

@airalcorn2 Thank you for the report. I attempted to reproduce this issue using two large (~5gb) pcap files, enumerating all scans from each in an infinite loop. However, I observe a more or less constant timing between runs.

OS-1-128_v3.0.1_2048x10_20230216_143245

49: 0.56 s
99: 0.56 s
...
1049: 0.56 s
1099: 0.56 s

OS-0-128_v3.0.1_2048x10_20230216_173241

49: 0.57 s
99: 0.56 s
...
1249: 0.56 s
1299: 0.56 s

(Similar output occurs over successive runs for these two files.)

Can you tell me a little more about your setup? In particular, are you using an HDD or an SSD for storage? In either case, what is the external drive's interface (e.g. USB 2.0, 3.0, or other?)

airalcorn2 commented 1 year ago

I confirmed that first moving the files locally does not result in any slowdowns when processing the files. However, the slowdown is 100% repeatable when reading the data from the external hard drive, which is an SSD connected with USB 2.0. If I unplug the hard drive, plug it back in, and run the script: I get this output:

OS-1-128-122213000577-2048x10_WAT0-D-N-FR

49: 4.67 s
99: 3.02 s
149: 4.48 s
199: 3.03 s
249: 3.06 s
299: 4.53 s

I hit CTRL-C after 299. When I immediately re-run the script, this is what I get:

OS-1-128-122213000577-2048x10_WAT0-D-N-FR

49: 0.27 s
99: 0.26 s
149: 0.25 s
199: 0.25 s
249: 0.25 s
299: 0.25 s
349: 1.16 s
399: 4.48 s
449: 3.04 s

You can see that all of the frames before I hit CTRL-C are iterated over very quickly, but then it immediately slows down, so the data must be getting cached between runs.