ros / ros_comm

ROS communications-related packages, including core client libraries (roscpp, rospy, roslisp) and graph introspection tools (rostopic, rosnode, rosservice, rosparam).
http://wiki.ros.org/ros_comm
751 stars 912 forks source link

rosbag: Better support for large bag files in rosbag #117

Open ablasdel opened 11 years ago

ablasdel commented 11 years ago

I would like to suggest improving the support for large bag files. The problem that causes a slow start up time is that when Bag::open is called it scans for all the chunks in the bag file. The default chunk size is 768Kb. So for a 30GB file (30 min stereo video) around 40K chunks. I believe what is slowing things down is that each chunks needs a disk seek (~10ms) so it takes several minutes to start playing back the file.

I don't know the details of the Bag library, but here are some suggestions that might serve as a solution to the problem.

  1. Allow for a configurable chunk size. e.g. if the chunk

size is 100MB in the example above then only 300 seeks are need and only ~3s spend on seeks instead of ~6min.

  1. Run a re-index that makes a copy of all chunk indexes

and puts them to the end of the bag file so that they can be read in on pass.

  1. Modify rosbag play so that it starts playing back the

file before finishing reading in all indexes. That is read the index and data in parallel or read everything sequentially and process each chunk header as soon as they are read from the file.

See a related question on answers.ros.org http://answers.ros.org/question/1813/playing-back-large-files-with-rosbag

migration of trac ticket 3829: https://code.ros.org/trac/ros/ticket/3829

chunk_size.patch:

Index: include/rosbag/bag.h
===================================================================
--- include/rosbag/bag.h    (revision 16227)
+++ include/rosbag/bag.h    (working copy)
@@ -91,6 +91,8 @@
      */
     Bag(std::string const& filename, uint32_t mode = bagmode::Read);

+    Bag(std::string const& filename, uint32_t mode, uint32_t chunk_size);
+
     ~Bag();

     //! Open a bag file.
Index: include/rosbag/recorder.h
===================================================================
--- include/rosbag/recorder.h   (revision 16227)
+++ include/rosbag/recorder.h   (working copy)
@@ -102,6 +102,7 @@
     uint32_t        max_size;
     ros::Duration   max_duration;
     std::string     node;
+    uint32_t        chunk_size;

     std::vector<std::string> topics;
 };
Index: src/recorder.cpp
===================================================================
--- src/recorder.cpp    (revision 16227)
+++ src/recorder.cpp    (working copy)
@@ -102,7 +102,8 @@
     split(false),
     max_size(0),
     max_duration(-1.0),
-    node("")
+    node(""),
+    chunk_size(1048576 * 1)
 {
 }

@@ -350,6 +351,7 @@

     updateFilenames();
     try {
+        bag_.setChunkThreshold(options_.chunk_size);
         bag_.open(write_filename_, bagmode::Write);
     }
     catch (rosbag::BagException e) {
@@ -487,6 +489,7 @@
         string write_filename  = target_filename + string(".active");

         try {
+            bag_.setChunkThreshold(options_.chunk_size);
             bag_.open(write_filename, bagmode::Write);
         }
         catch (rosbag::BagException ex) {
Index: src/record.cpp
===================================================================
--- src/record.cpp  (revision 16227)
+++ src/record.cpp  (working copy)
@@ -50,6 +50,7 @@
     desc.add_options()
       ("help,h", "produce help message")
       ("all,a", "record all topics")
+      ("chunk-size,c", po::value<int>()->default_value(1), "Chunk size in MB (Default: 1)")
       ("regex,e", "match topics using regular expressions")
       ("exclude,x", po::value<std::string>(), "exclude topics matching regular expressions")
       ("quiet,q", "suppress console output")
@@ -120,6 +121,13 @@
         opts.max_size = 1048576 * S;
       }
     }
+    if (vm.count("chunk-size"))
+    {
+      int chunk_size = vm["chunk-size"].as<int>();
+      if (chunk_size <= 0) 
+        throw ros::Exception("Chunk size must be positive");
+      opts.chunk_size = 1048576 * chunk_size;
+    }
     if (vm.count("buffsize"))
     {
       int m = vm["buffsize"].as<int>();
Index: src/bag.cpp
===================================================================
--- src/bag.cpp (revision 16227)
+++ src/bag.cpp (working copy)
@@ -82,6 +82,22 @@
     open(filename, mode);
 }

+Bag::Bag(string const& filename, uint32_t mode, uint32_t chunk_size) :
+    compression_(compression::Uncompressed),
+    chunk_threshold_(chunk_size),
+    bag_revision_(0),
+    file_size_(0),
+    file_header_pos_(0),
+    index_data_pos_(0),
+    connection_count_(0),
+    chunk_count_(0),
+    chunk_open_(false),
+    curr_chunk_data_pos_(0),
+    decompressed_chunk_(0)
+{
+    open(filename, mode);
+}
+
 Bag::~Bag() {
     close();
 }

change_cunk_size.py

#!/usr/bin/env python

PKG = 'sensor_msgs'
import roslib; roslib.load_manifest(PKG)
import rospy
import rosbag

with rosbag.Bag('output.bag', 'w', chunk_threshold=100 * 1024 * 1024) as outbag:
    for topic, msg, t in rosbag.Bag('input.bag').read_messages():
        outbag.write(topic, msg, msg.header.stamp if msg._has_header else t)

change history: Changed 11 months ago by hordurj

I included a patch to rosbag record that adds a chunk-size option. The chunk size is specified in MB, I was trying to follow the convention of other parameters. I also included a simple script that can be used to convert the chunk size of a bag file.

I did a quick test on a ~30GB file that had ~40K chunks. It took around 7 min to load on my machine. When I changed the chunk size to 100MB it took 9 sec.

Is there a disadvantage to having large chunk size?

Changed 11 months ago by hordurj

One thing that came up when using very large chunk size is that it can cause delays in the playback when a new chunk is loaded in. I fixed that by splitting the file reading and the playback into separate threads. I also added a simple cache manager because I noticed that sometimes a sequence of messages could come from different chunks causing the chunks to be reloaded multiple times. I'm currently just maintaining my own copy of rosbag. I would be happy to share these changes if there is any interest in incorporating them. But this might be just a niche use-case and the added complexity not worth it.

Changed 11 months ago by kwc

@hordurj: thanks for the patch. We can't integrate this right now as things are frozen for Fuerte. Apologies we didn't get to it sooner, but we were already 4 weeks behind on schedule and had to focus on high priority items.

stwirth commented 11 years ago

:+1: As I do very much experiments with long bagfiles including stereo images at 10Hz I am very much interested in this enhancement. Waiting minutes for the rosbag play to start is really annoying.

jbohren commented 11 years ago

Bump. Can we integrate this?

tfoote commented 11 years ago

Turning this into a pull request would make it much easier to merge and release.

jbohren commented 11 years ago

@tfoote roger.

jbohren commented 11 years ago

It looks like in ROS Groovy, rqt_bag seems pretty capable of dealing with large bagfiles. Even a 75GB file opened in in a little under a minute, while the command-line tool seemed to hang for a long time when opening.

dirk-thomas commented 10 years ago

The --chunk-size argument has been implemented quite some time ago. Are there still use cases where the performance is that worse?

protobits commented 9 years ago

Hi, I've been working with large bagfiles and I found the same issue. I experimented with changing the chunk size and I open them much faster. However, I've also encountered the problems mentioned by @ablasdel so I'm interested in the other changes suggested (such as threading, etc.). I would be interested in trying that out. Is it available as a fork?

dirk-thomas commented 9 years ago

The diff from the initial comment only implements the chunk size argument which is already available in the released binary packages.

For the other proposed features there are no patches available yet and therefore also no forks.

protobits commented 9 years ago

@hordurj is your code still around? I wonder how anyone is able to work with large bag files (such as those containing images) since the load time is unbearable. Rewriting the chunk size seems to improve things however it hinders real-time playback.

dirk-thomas commented 9 years ago

The patch you refer to added the chunk size option to rosbag and had been integrated into rosbag long ago.

protobits commented 9 years ago

I'm not referring to the patch itself, but the code @hordurj mentioned in his comment, which I imagine he didn't shared. The changes I'm referring to are the threaded implementation he mentioned.

dirk-thomas commented 9 years ago

Yes, the modification for a threaded implementation have never been shared.

The first comment of this ticket is an import of the history of the ticket from code.ros.org. So the username will potentially not match to original person which reported it. I don't think there is a chance to get a hand on that modification. Sad pre-GitHub times...

hordurj commented 5 years ago

I just came across this old topic and found the thread code mentioned on my old laptop. Is this something anyone is still interested in??? Or do everyone have SSD now :)

miralys1 commented 4 years ago

I just came across this old topic and found the thread code mentioned on my old laptop. Is this something anyone is still interested in??? Or do everyone have SSD now :)

Yes I would be interested since I am running into a CPU problem currently where either messages are skipped or it slows down the bag to a fraction of real-time.

hordurj commented 4 years ago

rosbag-chunk-thread.tar.gz Here is a tarball from my rosbag folder. Please feel free to use it as you see fit. I haven't touched this in a long time. But if I remember correctly the approach was to start one thread to read the indexes and another to playback. That way the playback could start right away while the file was still being scanned for indexes.

hordurj commented 4 years ago

I looked at the diff to the latest version on github. My patch is from a much older branch so there are things you would not want to pull over. The main change is that in publish doPublish is called for each message. If playing back in realtime then doPublish will wait for time to publish the message before reading the next one. In the version I sent I had added methods addPublish and runPublish, where addPublish just added the message to a publish queue but continues reading the file. Then there is another thread (runPublish) that pulls off messages and calls doPublish.

So it probably best for you to just pull those pieces into the latest branch. It should be fairly straight forward.

radhaanu26 commented 4 years ago

Hi @hordurj I am currently using the rosbag.bag to open the file and read the names of the topics. I have to handle 30 GB file size for which the my system freezes. I have developed this tool in python and would like to know how really should i fix this as i understand you have been active on this queries.

AnandShastry commented 4 years ago

Any update on this?

szx0112 commented 4 years ago

Google brings me here, do we have any update now?

mbed92 commented 3 years ago

I have 150GB bag file and starting it is torture. Any updates or suggestions on how could I manage such large files?