SYSTEM_TIME missing from .tlog files

m-h-williams commented 3 days ago

@clydemcqueen I'm hoping you might have some insight on this issue.

It appears that the SYSTEM_TIME message, which I’ve been using to collect date and time for surveys, is missing from the .tlog files for two survey days (09/10/2024 and 09/11/2024). Without these timestamps, I wasn’t able to convert the .tlog files to CSV using the (tlog_transects_to_csv.py).

To work around this, I attempted to gather timestamps from the GPS data, as the UAV log viewer shows the GPS coordinates were populated correctly. However, the system time appears to be missing from the GPS data (GPS_RAW_INT) as well.

I used the following script to inspect the timestamp information:

 from pymavlink import mavutil
  from datetime import datetime, timezone

  # Path to your .tlog file
  logfile = r"C:\Users\williamsm\Seattle Aquarium Dropbox\Coastal_Climate_Resilience\dives\Port_of_Seattle\2024_09_10_S1\logs\2024-09-10_S1_T1.tlog"

  # Connect to the .tlog file
  mav = mavutil.mavlink_connection(logfile)

  GPS_RAW_INT_timestamps = []

  while True:
      msg = mav.recv_match(blocking=False)
      if msg is None:
          break

      # Check for GPS_RAW_INT messages and extract the timestamp
      if msg.get_type() == "GPS_RAW_INT":
          gps_time_usec = msg.time_usec
          if gps_time_usec > 0:
              # Convert microseconds to seconds and to a readable datetime
              gps_time = datetime.fromtimestamp(gps_time_usec / 1e6, tz=timezone.utc)
              GPS_RAW_INT_timestamps.append(gps_time)

  # Output the results
  if GPS_RAW_INT_timestamps:
      print(f"Total GPS_RAW_INT messages with valid timestamps: {len(GPS_RAW_INT_timestamps)}")
      print(f"First GPS_RAW_INT timestamp: {GPS_RAW_INT_timestamps[0]}")
      print(f"Last GPS_RAW_INT timestamp: {GPS_RAW_INT_timestamps[-1]}")
  else:
      print("No valid GPS_RAW_INT timestamps found.")

Output:

Total GPS_RAW_INT messages with valid timestamps: 2178
First GPS_RAW_INT timestamp: 1970-01-01 00:05:43.585000+00:00
Last GPS_RAW_INT timestamp: 1970-01-01 00:14:47.850000+00:00

It seems like the GPS_RAW_INT timestamps are relative to the system boot time rather than a real-world time. Unfortunately we don't have the start time for these surveys as they were completed prior to us standardizing the method of recording the start and end time in our field notebook.

Do you have any idea as to why we would be missing time data from these logs so that we can avoid this in the future?

clydemcqueen commented 2 days ago

I looked at '2024-09-10 13-24-05.tlog' from the Dropbox share, and I can see the problem, but I don't know how it happened.

Typical operation looks like this:

QGC sends a bunch of REQUEST_DATA_STREAM messages to ArduSub, requesting various messages at specific rates. The SYSTEM_TIME message is part of "EXTRA3" (a "stream group") so it asks for the EXTRA3 stream group at 3Hz.
ArduSub saves the message rate for EXTRA3 in the SR0_EXTRA3 parameter to use later. (I think you can just set that parameter? Not sure.)
ArduSub sends the package of messages at the requested rates.

The stream groups are described here for reference.

You can use QGC to set the rates for each stream group. I think the default is 3Hz for EXTRA3.

Using MAVExplorer.py we can look at the what happened. QGC clearly sent the request, several times (EXTRA3 is req_stream_id 12):

MAV> dump REQUEST_DATA_STREAM
...
2024-09-10 11:31:51.745 REQUEST_DATA_STREAM {target_system : 1, target_component : 1, req_stream_id : 12, req_message_rate : 3, start_stop : 1}
...

Then I looked at the parameter to see if ArduSub stored the request:

MAV> param show *EXTRA3*
MAV> SR0_EXTRA3       0.000000
...

Nope, it was not saved. Why?

(I looked at other tlog files just to make sure that SR0_EXTRA3 is the correct parameter, and I can clearly see that it is 3.0 for a normal tlog file.)

This log shows very strange behavior at 2024-09-10 11:52:57: ArduSub gets into a weird jag where it spits out a zillion "intro" messages, which are typically caused by 2 things: ArduSub boots up (or reboots), or when QGC starts and says "hello, who are you?" and ArduSub replies. Perhaps ArduSub did not boot correctly? I also see a 3 minute gap in the messages, so it looks like communications went down for a bit. A puzzle.

To avoid this in the future, you could look for SYSTEM_TIME messages in QGC to make sure they are coming and look good. If not, restart the entire system.

I do think you can recover from this: QGC records the laptop time when each message was written to the tlog file in the tlog file. This is read by mavutil, so you can access this in Python: timestamp = getattr(msg, '_timestamp', 0.0). Units are seconds. This isn't quite the same as the Pi time when the message was sent, but it should be pretty close.

Here is a graph comparing SYSTEM_TIME.time_unix_usec / 1e6 vs _timestamp. (From a good log.) They are pretty close, but they do drift apart to 0.4s over 25 minutes:

I hope this helps.

m-h-williams commented 1 day ago

Thank you so much for digging into this and providing such a detailed response, Clyde! I agree that ArduSub's lag in this case is a bit of a puzzle, especially considering the ROV appeared to be functioning normally during these surveys.

Checking the SYSTEM_TIME messages in QGC while surveying is a straightforward and practical way to prevent this from happening again—I’ll make sure to incorporate that into our workflow.

I’m also relieved to hear you think the logs can still be salvaged! I’ll test the recovery suggestion using _timestamp in Python to align the events in the tlog with the laptop time. Given that the drift (0.4s over 25 minutes) is within acceptable limits for our purposes, this should work well as a fallback for analyzing problematic logs.

Thanks again for your help—I really appreciate it!

m-h-williams commented 21 hours ago

Using _timestamp indeed worked for these files. Thanks again for the suggestion, Clyde! I uploaded a separate script (tlog_csv_timestamp.py) for these instances.

zhrandell / Seattle_Aquarium_CCR_development

SYSTEM_TIME missing from .tlog files #23