ros2 / rosbag2

Apache License 2.0
285 stars 251 forks source link

Mark bags that are still being recorded #1597

Open tonynajjar opened 7 months ago

tonynajjar commented 7 months ago

Description

In ROS1, a bag that was still being recorded was suffixed with .active, it would be nice if in rosbag2 there was something similar. An example use case is a scripts that uploads bags to a server automatically - that script needs to know not to upload a bag if it's still recording.

defunctzombie commented 7 months ago

@MichaelOrlov @emersonknapp any recollection on why this was omitting from rosbag2 or if there's some gotcha with adding support for this?

MichaelOrlov commented 7 months ago

@defunctzombie @tonynajjar I don't know about any prior discussions about having a different suffix for currently writing file.
I guess it might be some corner cases with consistency in filenames in case if recording is abnormally interrupted and we have put current filename to the metadata and haven't renamed it. We are saving metadata at the beginning as well. Also not sure if file renaming could be easily done from C++ std::filesystem.

Also, I think that one of the reasons we didn't follow ROS 1 behavior is because we handling the matter of concern in the question

that script needs to know not to upload a bag if it's still recording

a better way than in ROS 1.

The one can subscribe to the topic "events/write_split" to get a notification when the current file is finished and ready to be processed by third-party script or program. Rosbag2 will publish a special message on this topic to notify remote nodes that the newly written file is ready and a new one has been created. The old and new filenames are part of this message in those notification. Yes, we have such a feature in the rosbag2, and sorry that we don't have documentation for it. Please refer to the following part of the code to better understand how to use it. https://github.com/ros2/rosbag2/blob/30328006f872de4e1c82fef4ecca8ca815b2d4dc/rosbag2_transport/src/rosbag2_transport/recorder.cpp#L321-L337 and https://github.com/ros2/rosbag2/blob/30328006f872de4e1c82fef4ecca8ca815b2d4dc/rosbag2_transport/src/rosbag2_transport/recorder.cpp#L356-L376 The key info there is

       auto message = rosbag2_interfaces::msg::WriteSplitEvent(); 
       message.closed_file = bag_split_info_.closed_file; 
       message.opened_file = bag_split_info_.opened_file; 
tonynajjar commented 7 months ago

The script in question is the foxglove-agent which is closed-source so unfortunately I don't have the option to adapt it. Either the Foxglove guys would need to do so (@defunctzombie ), or rosbag2 has to introduce a differentiator in the file name as this is the only thing foxglove-agent can ignore.

defunctzombie commented 7 months ago

The script in question is the foxglove-agent which is closed-source so unfortunately I don't have the option to adapt it. Either the Foxglove guys would need to do so (@defunctzombie ), or rosbag2 has to introduce a differentiator in the file name as this is the only thing foxglove-agent can ignore.

One option for ROS users would be a node which listens for these events and moves the "done" recordings to a different folder that is picked up by systems looking for files to upload.

Any system that could previously "watch" directories for changes now needs to be more "ROS-native" and be able to hook into the pubsub mechanism to listen for such events. Also raises some questions about what happens if such a script is started after recording nodes or re-started. Filesystem level indicators (like file extensions done in ROS1) are in some ways a more robust approach.

MichaelOrlov commented 7 months ago

I need to clarify. In general, I don't mind having a feature when a file that is currently in recording to be named with .active suffix. However, it shall be done correctly and aligned at least with the current implementation of the bag split notification and metadata writing. I personally don't have a capacity for this feature, but contributions are welcome.

Timple commented 7 months ago

Perhaps a simple active file inside the rosbag folder? That get's cleaned up when writing is finished?

MichaelOrlov commented 7 months ago

@Timple

Perhaps a simple active file inside the rosbag folder? That get's cleaned up when writing is finished?

That is not clear to me. We certainly shall not try to copy the finished .active file with a different name and then try to delete the original. We need to use rename in place.

Timple commented 7 months ago

Rosbags in ROS 2 are actually folders with a meta file and a database right?

We could place an empty file called active alongside those two. This file would be cleaned up when finishing the bagfile.

Might be easier on bash/python like scripts which don't have ROS capability.

MichaelOrlov commented 7 months ago

@Timple Thanks for the clarification. I think this is a good idea and perhaps we can do it with relatively low effort without affecting other functionality. As a quick design proposal, we can create .active file right after updating metadata at the end of the writer->open() method https://github.com/ros2/rosbag2/blob/34ccb39463f43a82b6d636ea75661cc6edd6c5f1/rosbag2_cpp/src/rosbag2_cpp/writers/sequential_writer.cpp#L162-L164 To get the current filename we can use storage_->get_relative_file_path() Then will need to try to delete the .active file in close() method https://github.com/ros2/rosbag2/blob/34ccb39463f43a82b6d636ea75661cc6edd6c5f1/rosbag2_cpp/src/rosbag2_cpp/writers/sequential_writer.cpp#L182-L188 and do delete and create a new .active file in the split_bagfile() method https://github.com/ros2/rosbag2/blob/34ccb39463f43a82b6d636ea75661cc6edd6c5f1/rosbag2_cpp/src/rosbag2_cpp/writers/sequential_writer.cpp#L331-L337

defunctzombie commented 7 months ago

IMO it is better to use .active for the file being recorded and do a rename on the file rather than introduce additional files and issues with synchronization. The .active extension was a fantastic property in the ros1 recorder; you immediately knew by looking at the file whether it was closed cleanly or not.

tonynajjar commented 7 months ago

One option for ROS users would be a node which listens for these events and moves the "done" recordings to a different folder that is picked up by systems looking for files to upload.

@defunctzombie I tried that and while that generally can work, it fails (or requires more implementation effort) in corner cases like when shutting down the system: although rosbag2 will conclude and close the bag being recorded, the script won't receive the WriteSplit event and therefore not move the last bag to the "done" folder.

I'm not saying this can't be solved/workedaround but it has enough complexity to say that that's not something any ROS user can simply do -> better solve this at rosbag2 or foxglove-agent's level

tonynajjar commented 7 months ago

Perhaps a simple active file inside the rosbag folder? That get's cleaned up when writing is finished?

@Timple @MichaelOrlov The problem I see with this is that sometimes the bag folder contains more than one bag file, e.g. when splitting recording by duration. An active file in the folder would generalize that all the rosbag files are actively being recorded when in reality it's only the latest one. So in this regard I also think that the suffix would be better

Timple commented 7 months ago

I think you're missing the structure. Rosbags are folders. They are also within folders. I'm proposing this, where the last file is still being written:

my-folder-with-rosbags/
  - rosbag_2024-01-01/
    - metadatafile.yaml
    - datafile.mcap
  - rosbag_2024-01-02/
    - metadatafile.yaml
    - datafile.mcap
  - rosbag_2024-01-01/
    - metadatafile.yaml
    - datafile.mcap
    - active
tonynajjar commented 7 months ago

Thanks for the example, I'll give you a counter example:

tony@tony-xmg-22:~$ docker run -it --rm ros:humble-perception
root@f56ce5889eb2:~# ros2 bag record -a -s mcap --max-bag-duration 1
[INFO] [1712828512.674402166] [rosbag2_recorder]: Press SPACE for pausing/resuming
[INFO] [1712828512.675851511] [rosbag2_recorder]: Listening for topics...
[INFO] [1712828512.675879755] [rosbag2_recorder]: Event publisher thread: Starting
[INFO] [1712828512.676785630] [rosbag2_recorder]: Subscribed to topic '/test'
[INFO] [1712828512.677450439] [rosbag2_recorder]: Subscribed to topic '/rosout'
[INFO] [1712828512.677995401] [rosbag2_recorder]: Subscribed to topic '/parameter_events'
[INFO] [1712828512.678360833] [rosbag2_recorder]: Subscribed to topic '/events/write_split'
[INFO] [1712828512.678432689] [rosbag2_recorder]: Recording...
[INFO] [1712828514.341021388] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while
[INFO] [1712828516.341312147] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while
[INFO] [1712828518.340502302] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while
[INFO] [1712828519.123073384] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while
[INFO] [1712828519.132002960] [rosbag2_recorder]: Event publisher thread: Exiting
[INFO] [1712828519.132170728] [rosbag2_recorder]: Recording stopped
root@f56ce5889eb2:~# ls rosbag2_2024_04_11-09_41_52/
metadata.yaml  rosbag2_2024_04_11-09_41_52_0.mcap  rosbag2_2024_04_11-09_41_52_1.mcap  rosbag2_2024_04_11-09_41_52_2.mcap  rosbag2_2024_04_11-09_41_52_3.mcap

In this case, the directory would look like that:

root@f56ce5889eb2:~# ls rosbag2_2024_04_11-09_41_52/
active metadata.yaml  rosbag2_2024_04_11-09_41_52_0.mcap  rosbag2_2024_04_11-09_41_52_1.mcap  rosbag2_2024_04_11-09_41_52_2.mcap  rosbag2_2024_04_11-09_41_52_3.mcap

At that time, only rosbag2_2024_04_11-09_41_52_3.mcap is still being recorded

Timple commented 7 months ago

Ahaaa, I did not know that split bagfiles actually ended up in the same directory. (Sorry, should/could have verified).

But this is kind of strange right? The 'total' bagfile is still open. I know we can treat mcap files isolated from their folders, but that's a whole different issue perhaps?

But I don't actually mind the direction we end up with :slightly_smiling_face: . So alternatives are fine for me!

defunctzombie commented 7 months ago

Rosbags are folders

FWIW I take a different view here. I do not consider the folder a "bag" - to me it is simply that - a file system folder that holds some files. I consider the individual file (the mcap) the "bag" file. I've always found it strange that this "folder" layer was introduced; for me it adds a layer of indirection and nomenclature I find confusing and unnecessary.

facontidavide commented 6 months ago

Adding .active sonds as the best idea idea. As an alternative, what about file locks? They are easy to implement using boost:

https://www.boost.org/doc/libs/1_85_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.file_lock

My idea is to mark as "locked" all the files in the directory, until recording is done.

clalancette commented 6 months ago

As an alternative, what about file locks? They are easy to implement using boost:

File locks are not a bad idea, but no boost, please. It is a very heavyweight dependency.

facontidavide commented 6 months ago

funny, I would have sweared that boost is a common depoendency in the core of ROS. No problem :smile:

defunctzombie commented 6 months ago

My idea is to mark as "locked" all the files in the directory, until recording is done.

If this was only for the application itself to manage this could be ok but if the intent is to have outside applications or folks listing the directory easily understand the situation I think .active is nicer. It also has the property that if the recording process dies you visually see that this specific file was not closed.

fujitatomoya commented 6 months ago

to have outside applications or folks listing the directory easily understand the situation I think .active is nicer.

IMO, i second this with renaming the file without creating extra file nodes. sending the data / logging files to the server or cloud once it is ready is not application specific behavior, this could be also nicer for rosbag2 to go with. we can lock the file either segment or file descriptor, but sounds overkill unless we actually have racy condition to read/write, i guess.

btw, i have a question about metadata.yaml. metadata.yaml is also tagged with .active during recording process? i am not sure if metadata.yaml needs to be updated when the recording is completed.

MichaelOrlov commented 5 months ago

@fujitatomoya @facontidavide @defunctzombie The proposed solution with renaming the currently writing file doesn't fit very well with the current design of the rosbag2, specifically when we are using per-file compression.

Think about a situation when we do split with file compression.

  1. If we will rename file to the bag_file_name.active and then back to bag_file_name.mcap on the file closer, the external script may think that we are done with it and start uploading it to the cloud. However, we are not done with it yet, and the compressor works in the background compressing those files. As soon as the compressor is done, he will try to delete bag_file_name.mcap and update metadata.
  2. If we will rename file to the bag_file_name.active and then compress it and try to delete it after bag_file_name.zip is finished. Then we will have the bag_file_name.active inside the archived bag_file_name.zip file. It will be with *.active extension instead of the *.mcap when we will decompress it on playback and playback will fail.

Please note that currently, we have a buggy behavior with file compression when we are sending notifications about the bag_split event via messages. i.e. we are sending notifications when bag file is closed after the split not when it has been compressed. I have tried to fix it in the https://github.com/ros2/rosbag2/pull/1643 However, don't have time to write a unit tests. IMO the new work related to this issue shall be done above https://github.com/ros2/rosbag2/pull/1643 anyway.

fujitatomoya commented 5 months ago

IMO the new work related to this issue shall be done above https://github.com/ros2/rosbag2/pull/1643 anyway.

👍 agree, and thanks for sharing insights!

all that said, i think the completion event (renaming system calls on this event) should be including compression operation if compression is required to process. this is just because file is not ready to use yet.