ihedvall / mdflib

Implementation of the ASAM MDF data file.
https://ihedvall.github.io/mdflib/
MIT License
73 stars 30 forks source link

Request for Function to Retrieve Channel Value at Specific Time #83

Closed Amineqr closed 3 months ago

Amineqr commented 5 months ago

Hello,

First of all, thank you for maintaining this great library!

I am currently using mdflib and I have a specific requirement: I need to retrieve the value of a channel at a specific time. Ideally, I would like a function that allows me to specify a time and get the corresponding value from a given channel.

For example, something like this:
double GetValueAtTime(size_t channelIndex, double targetTime); This function would:

  1. Take a channel index and a target time as input.
  2. Return the value of the specified channel at the exact target time if it exists, or the closest previous value if it does not.

I have checked the current documentation and source code, but I could not find such a function. Before I proceed to implement it myself, I wanted to ask if this functionality is already available in some form that I might have missed. If not, would it be possible to consider adding it to the library?

ihedvall commented 5 months ago

Hello I assume you want to read the MDF file. The reader object reads the file in 2 steps. First it reads in the configuration as number of channels and there names (ReadEverythingButData()). This read is fast.

The next read intend to fetch the sample data. This is a lengthy operation and requires a lot of memory. Before doing the read, the user should define channel subscribers i.e. what channels to read. After the read (ReadData()), the subscribers holds all samples in memory i.e. fetching values from the subscribers will be very fast.

One of the subscribed channels should be the master channel which typical is the time channel. The other channels are selected by the user.

Now to define your requirement, I need to know when you want to fetch the value.

  1. Fetch a single value from an MDF file without reading anything from the file ahead. The disadvantage is that the next call might take long time as you need to read the data again. How to get the channel index?
  2. As above but read in the configuration once. This solves the problem with the channel index. The disadvantage is still that each call requires a read of data.
  3. The above but read in the time master channel data only. Getting the closest sample index relative to the target time is an easy for loop. In the next file read we can get a single sample and its value. This read will be fast as we can step the file (not reading in every sample record).
  4. Well is to use existing functionality by subscribing on the time channel an the data channel and read in all samples. Then, by for looping the time channel, we find the closest sample index and the fetch the value from the subscriber.

There exist flavors of the 4 examples above so we need to narrow down your requirements.

Best Regards Ingemar Hedvall

Amineqr commented 5 months ago

Hello again,

Thank you so much for the quick and detailed response to my question. I really appreciate it!

After going through the options, I think option 4 is the best fit for my needs because it offers the best performance by loading all the samples into memory at once.

Here’s the approach I’m thinking of:

This should allow for fast lookups after the initial data load, which is crucial for my use case.

Does this sound like a good approach to you? Any tips or recommendations to make it even more efficient would be greatly appreciated!

Thanks again for your help!

Best regards,

Amine

ihedvall commented 5 months ago

I think that the only change is on the IChannelObserver object. It already have a IsMaster() function. The good thing with the master channel is that all values must be ascending. The idea is to first find the sample index closest to a specific time value. This is done by fetching time value for each sample index and when a sample index is found, fetch the other value for that sample index.

There is a small disadvantage in the implementation is that the channel observer (subscriber), stores the channel value i.e. the non-scaled values while the end-user want engineering value (GetChannelValue() vs GetEngValue(). The time channel has often some tricks to calculate the time but the output is always number of seconds.

  1. It is possible to make a specialized function on the IChannelObserver as GetClosestSample(double time) -> uint64_t returning a sample index. The drawback is that only works on time master channels.
  2. Another more general approach is to implement a function that returns a vector of scaled values as GetEngValueArray(vector& list).
  3. Don't change the interface and do some of the above in your application, so to say.

I recommend number 2 and add a get max, min and average function. These are typical functions needed when using an ASAM ODS database as index when dealing with a large number of MDF files.

Amineqr commented 5 months ago

Thank you for your suggestion. While implementing GetEngValueArray would indeed provide a general mechanism to retrieve scaled values for a channel, my primary goal is to efficiently retrieve values at specific time points from the data. The approach you mentioned, involving finding the closest sample out of the master channel and then fetching the corresponding values for the target channels, aligns more closely with my immediate requirement. I can easily retrieve the Time Channel using GetXChannel() and then call a function such as GetClosestSample(double time) to obtain the closest sample index. Once I have the sample index, I can use it to directly retrieve the engineering value of the target sample.

As such, I believe focusing on refining the method to obtain the closest sample and efficiently retrieving values at specific time points will better suit my current needs. However, I appreciate your suggestion and will keep it in mind for future enhancements to the functionality.

ihedvall commented 5 months ago

I did a small function that should solve your problem. Note that I have not test if it compile.

If no sample found, I suppose return invalid value would be OK. Alternative take min sample (0) or max sample?

// Returns true if value is valid
template<typename V>
bool GetClosestSample(const IChannelObserver& time, const IChannelObserver& channel, double time_ref, V& value) {   
  int64_t sample_ref = -1; // Indicate no sample found 
  // Find closest sample index first
  for (uint64_t sample = 0; sample < time.NofSamples(); ++sample) {
    double time_value;
    time.GetEngValue(sample, time_value);
    if (time_value > time_ref && sample > 0) {
      sample_ref = static_cast<int64_t>(sample-1);
      break;
    } 
  } // End for loop
  // Todo: What happens if no sample found ?
  return  channel.GetEngValue(sample_ref, value);

}
Amineqr commented 5 months ago

Thank you very much for taking the time to implement the function for me, I really appreciate it!

I have one more question: Is there a function in the library that allows me to retrieve a channel object from a channel index? This would be helpful for my workflow.

Thank you again for your assistance!

ihedvall commented 5 months ago

Well, there is no channel index in MDF only channel names. But within one channel group are the channels linked in a specific order. The function IChannelGroup::Channels() returns a list of channels. Unfortunately this list also includes any composite channels i.e. a byte array channel that expands to several sub-channels. This is typical when recording bus traffic as CAN messages. It can also be several channel groups within one data group (measurement).

I'm not sure if we mean the same thing with channel index.

I modified the sample_ref usage below so it handle min and max index.

// Returns true if value is valid
template<typename V>
bool GetClosestSample(const IChannelObserver& time, const IChannelObserver& channel, double time_ref, V& value) {   
  uint64_t sample_ref = 0; // Indicate no sample found 
  // Find closest sample index first
  for (uint64_t sample = 0; sample < time.NofSamples(); ++sample) {
    double time_value;
    time.GetEngValue(sample, time_value);
    if (time_value > time_ref) {
      break;
    }
    sample_ref = sample 
  } // End for loop
  return  channel.GetEngValue(sample_ref, value);

}
Amineqr commented 5 months ago

Indeed, I was referring to the index associated with each channel object (Channel->Index()). If there's a way to retrieve an IChannel object from the list of channels within a channel group using this index, it would be very helpful for my workflow

ihedvall commented 5 months ago

The Index() function exist on all MDF blocks and it was intended as an internal function. When you read the file, the index is the blocks file position. There is a function in the MdfBlock class, MdfBlock::Find(index) that returns any other block but it is not publicly available.

The simplest and possible the fastest is to iterate through all channels on the channel group block and returning the channel with a matching index. It is possible to make a IChannelGroup::GetChannel(index) function.

Amineqr commented 5 months ago

That is Perfect, Thank you so much for ur help and ur time.

ihedvall commented 5 months ago

I do appreciate that you give me input on the usage of the library. So if you have some requirements or improvements, please let me know. I am open for Teams meeting (or similar). There exist a project attached to the library where I normally put requirements and plans.

Best Regards Ingemar Hedvall

Amineqr commented 5 months ago

Probably one more question: I am intending to read all the channels with all the samples, and I have already done this. However, due to performance concerns, I would like to exclude CanDataFrame channels and just get everything else. Currently, I am creating ChannelObservers for all channels. Is there any way to achieve this filtering?

ihedvall commented 5 months ago

There is 3 different function to choose when creating observers. Instead of using the function that generate all in the group, you have to add them one by one or simply delete the unwanted subscriber afterwards.

My intuition does however hint me that your problem is more complex. MDF files with CAN data frames, normally doesn't have any separate signals channels. Bus logger files just record frame bytes and by knowing the DBC/A2L file, it can parse out the signals.

Amineqr commented 4 months ago

Hmm, is there a faster method to read the data? I am currently using the code from your example, which works well but seems a bit slow (for a 32 bit Application) . I need to call readData instead of just readEverythingButData because I require the first value of all existing Channels to be stored. Do you have any suggestions to enhance performance? Your insights would be greatly appreciated : your example Code :

    MdfReader reader(filePath);
    reader.ReadEverythingButData();
    const auto* mdf_file = reader.GetFile(); // Get the file interface.
    DataGroupList dg_list;                   // Get all measurements.
    mdf_file->DataGroups(dg_list);
    for (auto* dg4 : dg_list)
    {

        ChannelObserverList subscriber_list;
        const auto cg_list = dg4->ChannelGroups();
        for (const auto* cg4 : cg_list) {
            const auto cn_list = cg4->Channels();
            for (const auto* cn4 : cn_list) {
                // Create a subscriber and add it to the temporary list
                auto sub = CreateChannelObserver(*dg4, *cg4, *cn4);
                subscriber_list.push_back(std::move(sub));
            }
        }
        reader.ReadData(*dg4); // Read raw data from file

        // Now it is time to read in all samples
        for (auto& obs : subscriber_list) {
            NamedValue newEntry;
            newEntry.name = _strdup(obs->Name().c_str());
            obs->GetEngValue(0, newEntry.value);
            list.push_back(newEntry); 
        }

    }
    reader.Close(); // Close the file
ihedvall commented 4 months ago

Both Yes and No. You pretty much always have to do the ReadEveryThingButData() function. All raw data are stored as sample records in a big blob. The channel group (CG=Record) is either has fix byte size or has variable length data (VLSD). So there exist a specialized function ReadVlsdData() which is almost doing what you want. This function was developed to read a range instead of reading all samples. The VLSD data in typical video stream/sample and file sizes are 100GB. There is not enough room in the primary memory for all samples. It's little bit funky, but you first read in a sample index->Offset list and next read in the raw data, typical sample by sample.

The trick here is that you only read the CG record id and the step forward in the file until you find your sample(s) and then break. If you only want to read one sample, this method is very fast.

My proposal is to make a "ReadPartialData()" function with an input argument of min and max sample number. This function will be much faster, unless you specify to read all samples of course.

There exist another type of subscriber called SampleObserver. Instead of adding channel observer, you add one sample observer + a callback function for each sample. You need to handle the "saving" of sample values by your self as the subscriber doesn't do it. But currently, there is no end-user break functionality. It do save some memory however. Using it with VLSD data may fail.

ihedvall commented 4 months ago

The previous version of the MDF library was running on 32-bit Windows 10. I had to do several fixes and tricks to solve the constant not enough memory problem. This version of the MDF library assumed that 64-bit operating system was targeted.

I propose that we replace the ReadData() call with a ReadRangeData(min_sample, max_sample) function instead. This function will only read in samples between min and max. This will make the read much faster. For example if you read 1 sample and number of samples are N. This function will be N-times faster (well close to that area at least).

To solve the memory problem that may occur, I recommend to use the simple Sample Observer instead of several Channel Observers, so you can read in large MDF files. You need to handle the sample values in your own callback function though.

I will add a boolean return value in the callback so it is possible to stop any further reading of data.

Is these changes acceptable for your application ?

Amineqr commented 4 months ago

Thank you for your detailed response and for considering my application's requirements. The proposed changes, including the ReadRangeData(min_sample, max_sample) function and the use of a simple SampleObserver with a boolean return value in the callback, are definitely acceptable and much appreciated.

Thank you once again for your support and assistance.

Best regards

Amineqr commented 4 months ago

Could you please provide an example of how to use the SampleObserver to read values for a specific channel? An example showing the setup of SampleObserver, the callback function, and the process of reading and storing channel values would be incredibly helpful.

Your guidance in this matter would be much appreciated.

ihedvall commented 4 months ago

I fetched the below code snippets from the unit test. It just prints the values onto the console. I bind the callback function using a lambda function but you should bind it to your member function.

TEST_F(TestWrite, Mdf4SampleObserver ) {
 // Removed the code that writes the file as well as unit test code. It's a bus logging file.
  MdfReader reader(mdf_file.string());
  reader.ReadEverythingButData());
  const auto* header1 = reader.GetHeader();
  auto* last_dg1 = header1->LastDataGroup();

  const auto* channel_group1 = last_dg1->GetChannelGroup("CAN_DataFrame"); // Needed for Record ID

  const auto* channel1 = channel_group1->GetChannel("CAN_DataFrame.DataBytes");  // Needed for getting values

  ISampleObserver sample_observer(*last_dg1); // This attach the observer to the last data block.

  // Set up the callback function. I'm using a lambda function here.
  sample_observer.DoOnSample = [&] (uint64_t sample1, uint64_t record_id,
      const std::vector<uint8_t>& record) {
    bool valid = true;
    std::string values;
    if (channel1->RecordId() == record_id) {
      valid = sample_observer.GetEngValue(*channel1,sample1,
                                              record, values );  // This is a template function similar to the channel observer
      std::cout << "Sample: " << sample1
                << ", Record: " << record_id
                << ", Values: " << values << std::endl; // Just printing the values.
    }
  };
  reader.ReadData(*last_dg1); // Do read all samples.
  sample_observer.DetachObserver(); // Disconnect the observer from the reader.
  reader.Close();

}
Amineqr commented 4 months ago

Thank you for your assistance with the sample observer code. I have a couple of questions and encountered an issue that I would like your help with:

  1. Retrieving Name and Value with Sample Observer: I'm trying to understand how to map the channel names to their corresponding values using the sample observer. Specifically, I want to retrieve and print the name and value at sample 0. Could you explain how the sample observer associates the channel names with the values in the callback function?
  2. Issue with CAN_DataFrame: I initially tried to use CAN_DataFrame as a channel group, but it appears that in most of my files, CAN_DataFrame is a channel rather than a channel group. This caused channel_group1 to be nullptr, and the program crashed. Should I be accessing CAN_DataFrame directly as a channel instead? If so, how should I adjust the code to handle this correctly?

Here are the steps I took and the adjustments that led to the information I have now: ` auto* last_dg1 = header1->LastDataGroup();

const auto* channel_group1 = last_dg1->GetChannelGroup("CAN_DataFrame");
if (!channel_group1) {
    std::cerr << "Channel group 'CAN_DataFrame' not found" << std::endl;
    // Optional: list available channel groups
    std::vector<IChannelGroup*> channel_groups;
    channel_groups = last_dg1->ChannelGroups();
    std::cerr << "Available channel groups:" << std::endl;
    for (const auto* group : channel_groups) {
        std::cerr << " - " << group->Name() << std::endl;
        std::vector<IChannel*>  channels = group->Channels();
        std::cerr << "Available channels:" << std::endl;

        for (const auto* ch : channels)
        {
            std::cerr << " - " << ch->Name() << std::endl;

        }
    }
}
const auto* channel1 = channel_group1->GetChannel("CAN_DataFrame.DataBytes");  // Needed for getting values`
Amineqr commented 4 months ago

Just to make u understand my Plan : My application just needs to get the channel information and the first value at sample 0 during initialization. This information (such as name, unit, and initial value) is crucial for setting things up. After retrieving this data, I don’t need to read all the data upfront.

However, using ReadData(*dg4) in the loop is causing slow performance on a 32-bit system when retrieving this initial data. The process is taking much longer than I’d like. Once I have the initial data, I plan to use a function to get values at specific times when needed. This way, I only need to read all sample data for a specific group, and then I can use a function to get the value at a particular time. This approach should help me process the data more efficiently.

ihedvall commented 4 months ago

Sorry, I made a simple solution for me. You cannot use that code directly for your MDF files. You have to rename channel and channel group names. The code snippet was fetched from my unit test which first create a standard bus logger file and then reads it.

The input arguments in the callback function are sample number, record ID and a record byte array. The channel group have a unique record ID. The record byte array holds the groups channel data. The channel configuration, specify the where in the record byte array, the value is stored. The GetEngValue() and GetChannelValue() functions do this translation.

The ReadEverythingButData() function is needed to get the CG ->Record ID relation and where in the record byte array the channel values are stored. As the callback function doesn't have any reference to any channel objects. You need to have some member value (list) that stores the record ID to channel references before calling ReadData() or ReadPartialData().

ihedvall commented 4 months ago

Out-of-sync comment on your plan. I recommend this basic design if the input file doesn't change.

Create a class that holds an MDF reader object. Call the ReadEverythingButData() once only and reuse this configuration in later calls of Read(Partial)Data(). You can close (reader.Close()) the file at any time. It will reopen and close. You can also have it open as it makes the read somewhat faster but no one else can access the file during this time.

After getting the file configuration and register a call-back function, you can call the Read(Partial)Data as many times you want. Your call-back function needs to handle your requirement.

You do have one issue with the time to sample index relation. You want to call the ReadPartialData() function as it is much faster but it uses the sample index as input, not the time. So at some time, you need to invest some read time to read in the time channel samples. I assume that you are not so lucky that the sample time is periodic.

The above assumes the input file doesn't change between reads, at least not so often.

Amineqr commented 4 months ago

Thank you for outlining the recommended design approach. It seems like a solid plan. Regarding the issue with the time to sample index relation, I believe I can address this by developing an algorithm to efficiently find the sample index for a specific time within the given time range and number of samples. This will ensure that I can effectively use the ReadPartialData function.

However, since I’m still relatively new to the MDF library and its functionalities, I would appreciate your guidance on how to develop the ReadPartialData function to read a specific time range. Could you provide some insights or examples on how to implement this effectively?

ihedvall commented 4 months ago

I'm doing the ReadPartialData() function. There are some difficulties if there is more than one channel group.

ihedvall commented 4 months ago

Sorry, I'm been distracted by other issues for some days. I will start to finalize the ReadPartialData() function now.

ihedvall commented 4 months ago

I have checked in the ReadPartialData() functionality.