ihedvall / mdflib

Implementation of the ASAM MDF data file.
https://ihedvall.github.io/mdflib/
MIT License
61 stars 28 forks source link

How to create channels using a dbcfile/loop #92

Closed Ritzermaxl closed 1 month ago

Ritzermaxl commented 1 month ago

Hi,

I am trying to create mf4 from raw log data using the static mdflib. To parse the data i am using the dbcppp lib. I was wondering if its possible to create channels using a for loop. The only working examples i found are in the testwrite.cpp file, are there any more for reference? Also is it possible to setup all the cahnnels before creating the mdf file itself? I am converting multiple log files and this would allow me to read the dbc files only once instead of every time a new file is created.

This is how i am currently going through my dbc file:

`for (const dbcppp::IMessage& msg : net->Messages()) { messages.insert(std::make_pair(msg.Id(), &msg));

std::cout << msg.Name << std::endl;

//Here i want to create a mdflib group for each message;

for (const dbcppp::ISignal& sig : msg.Signals()) {
    std::cout << sig.Name() << std::endl;
    std::cout << sig.Unit() << std::endl;

    //Here i want to create a mdflib channel for each signal;`

I would really appreciate your help!

ihedvall commented 1 month ago

I'm off my computer for some days but the GitHub/ihedvall/dbclib might be a better choice. If you have an MDF bus logger file with matching DBC file, the library has data byte to DBC signal values parsers. You can also setup a signal subscriber in the library similarly to the MDF library. The DbcWiewer application has a menu option to read in a MDF bus logger file. I'm back on friday.

ihedvall commented 1 month ago

Sorry, I noticed that you want to create a MDF file with signal values from CAN data frames. You need to create one channel group per frame ID and then add time plus channels. It's a little bit work to setup the channel configuration before adding but the functionality is there. The current writer is intended to work in real time so to say. I suspect that your application is more of converter type so it might be time for an open source converter application?

ihedvall commented 1 month ago

The MdfWriter object is not a perfect solution for a converter application. It is assumes that the samples ie the frames, are retrieved in time order. It uses an internal cache of samples. Depending how your log files are designed, they also store the frames in time order.

First you need to set a file name. Next you need to configure the MDF file. One channel group for each frame ID. Each frame consist of one or more signals. Define one channel for each unscaled signal value. Add a converter CC block that converts the raw signal to scaled values. The DBC file has this information. Optional you may attach the DBC file.

Now it's time to add samples. Init and start the measurement.

For each frame you have a timestamp, a frame ID and some data bytes. You need to convert the data bytes to signal raw value and update the channel values with latest value. Then you call the save sample function on the channel group.

Repeat until read and then stop and finalize the measurement. The file is now done.

Note that timestamp may cause issues depending on your input. The internal cache is just annoying in a converter application so another type of writer object may be needed. Please let me know if this is a possible solution for you.

Ritzermaxl commented 1 month ago

Hi, thanks for your quick and detailed answers! I do receive the messages in the order they are recorded in, however I can happen that I save another Sample to the same timestamp again, due to multiple CAN channels.

This is a shortened version of the code I currently have, it goes through the DBC file and adds a channel group for each ID, and corresponding channel for the signals. This seems to work fine, I can view the channelgroups while debugging and the final generated file has all of them.

However, I think I still haven't quite figured out how to write to the right thing? The generated file is empty. Also, the loop of setting channel values and saving samples seems to be quite slow. (Slower than writing to other files, at least) If it's still the wrong application for this library, please let me know, I would still like to thank you for your help.

//Setup Writer

auto writer = MdfFactory::CreateMdfWriter(MdfWriterType::Mdf4Basic);
writer->Init(mdf_file.string());
auto* header = writer->Header();
auto* history = header->CreateFileHistory();

history->Description("Testing");
history->ToolName("Mdflib");
history->ToolVendor("TUGRacing");
history->ToolVersion("0.1");
history->UserName("Me");

// Add database file for signal names
int databasesAdded = 0;
auto* data_group = header->CreateDataGroup();

for (const dbcppp::IMessage& msg : net->Messages()) {
    messages.insert(std::make_pair(msg.Id(), &msg));

    //Create a channel group for each message, Seems to work;
    auto* group = data_group->CreateChannelGroup(msg.Name());
    group->Name(msg.Name());

    for (const dbcppp::ISignal& sig : msg.Signals()) {

        //Create channels for each signal in corresponding channel group, seems to work;
        auto* ch = data_group->GetChannelGroup(msg.Name())->CreateChannel();
        ch->Name(sig.Name());
        ch->Unit(sig.Unit());
        ch->Type(ChannelType::FixedLength);
        ch->Sync(ChannelSyncType::None);
        ch->DataType(ChannelDataType::FloatLe);
        ch->DataBytes(8);
    }
}

writer->PreTrigTime(0);
writer->InitMeasurement();
writer->StartMeasurement(curKvmEvent.eventUnion.msg.timeStamp);

while ((readStatus = kvmLogFileReadEvent(kmfHandle, &curKvmEvent)) ==  kvmOK) {
    //Reads Messages in the order they are received on the Logger

    auto iter = messages.find(curKvmEvent.eventUnion.msg.id);
    if (iter != messages.end()) {

        const dbcppp::IMessage* msg = iter->second;
        for (const dbcppp::ISignal& sig : msg->Signals()) {

            //Get name of current message
            std::string msgName = msg->Name().c_str();

            if (sig.MultiplexerIndicator() != dbcppp::ISignal::EMultiplexer::MuxValue)
                //Get name of current signal
                std::string sigName = sig.Name().c_str();

                data_group->GetChannelGroup(msgName)->GetChannel(sigName)->SetChannelValue(static_cast<double>(sig.RawToPhys(sig.Decode(curKvmEvent.eventUnion.msg.data))));
              //I think this right here is quite wrong, i am not sure how to get to the right channel properly;

        }

        writer->SaveSample(*data_group->GetChannelGroup(msgName), static_cast<uint64_t>(curKvmEvent.eventUnion.msg.timeStamp));       
    }
}

writer->StopMeasurement(curKvmEvent.eventUnion.msg.timeStamp);
writer->FinalizeMeasurement();
ihedvall commented 1 month ago

I think that your basic problem is that the times. It shall be absolute time nanoseconds since 1970, uint64_t. The call to the StartMeasurement shall be the first message time followed by calls to SaveSample and the last sample time in the StopMeasurement. The internal cache saves all samples between first and last time.

You are missing the time master channel in each channel group. It holds the timestamp now relative to start time.

In your application, the internal cache is just annoying why another type of writer without the cache would simplify your application. The SaveSample function simply writes to disc without passing through the cache.

Your DBC signals are not all floats. Normally there are enumerate strings but we might take that later.

Ritzermaxl commented 1 month ago

The missing master channels were the problem, I am now getting a full MDF File. Thanks for your help, I really appreciate it! Thanks for the heads up about the internal chache, its indeed quite the limit for my application.

ihedvall commented 1 month ago

I'm back from vacation so I don't need to do all conversation through the phone.

Well it should work without the time channel. I will check it up.

I don't know your inputs and outputs for the application. Unlike the MDF reader which is pretty general, the writers are specialized. There exist several types of writers. The basic logger (Mdf4Basic) shall be used when recording live samples that arrives with some period or randomly. The CAN loggers (MdfBusLogger) are similar. It assumes the the CAN messages coming randomly in time order.

In your application, you get the message from so other file. The above loggers uses a internal queue of samples and that queue solves the problem with pre-trig time i.e. you want to store samples that arrived before the start of measurements. It also solves the problem if compression is enabled. The samples are then compressed and flushed in 4MB blocks.

Creating a specialized converter (MdfConverter) version of the above loggers are quite simple. The only main change is to remove the thread that operates on the internal queue. This thread opens the file, flush out the samples and the closes the file. Note that the open/close makes it possible for another application to read the file.

The converter will keep the file open assuming that all samples arrives shortly. At the end the file should be closed. There is no need for pre-trig time handling and therefore no need for a thread. I need to keep the internal queue to solve the problem with the 4MB compression of block but the queue doesn't need the samples in time order.

You can also speed up reading if the MDF is sorted, well at least for some type of reads. To get the "above" file sorted, you need to add all messages by a specific CAN ID, then add the next CAN ID and so on. Some ODS index databases need sorted MDF files.

I mentioned earlier that your byte to signal values conversion isn't as general as you assumed that all signals are floating point values. There is several other data types. Enumerates i.e. index to text translation are common. There are 3 tricky issues with the CAN protocol. There are sometimes multiplexed signals, extended multiplexed signals and signals that are larger than 8 bytes. Especially the latter is complicated to translate as it is dependent of the CAN protocol in use. Number of messages is more or equal of number of samples.

Note that the SaveSample function converts the signal values to a record (byte array) and in the end store that onto the disc. This is in principle the same format as your started with your message log. If you only need the signal values temporary, you might be better of to skip the conversion to the MDF. Reading and parsing the MDF will be as slow as reading and parsing the original log files. There is several read speed improvement as database indexing (OdsLib) or simply use the DbcLib directly.

I can optional do your "entire" application but for that I need more information

I have 2 actions to do (so far).

  1. Test the missing time channel failure (bug).
  2. Create a converter writer type (improvement).

If you are interested by other options, we might to have a meeting (MS Teams?). I try to avoid e-mail addresses in public domains as this. I normally use LinkedIn for a more private messaging.

ihedvall commented 1 month ago

I have checked in the MdfConverter. You can create an MdfConverter instead of a Mdf4Basic writer object in the MdfFactory::CreateMdfWriter function. It keeps the file open while you add samples otherwise the same code as before.

You get the best performance if you read in one message and adds that message to the writer. This minimizes the primary memory and in theory the fastest conversion.

Please let me know if you need any help with creating the channel and channel group configuration from a DBC file. The above configuration doesn't work for all type of DBC files.

Ritzermaxl commented 1 month ago

Hi, thank you a lot for the implementation! Sadly its still quite slow for my application, writing around 1.5 Million Samples takes around 6 minutes (compression adds 20s). Writing the same data to a map and saving this map to a hdf5 file takes around 20s.

Please don't get me wrong, thanks for the quick implementation, i am sure others will benefit greatly from this!

ihedvall commented 1 month ago

In the unit test TestWrite::MdfConverter, it saves 2 million samples within 2 seconds, with compression. It is far from the 6 minutes but storage to HDF5 should work as well. If you have some time, you can comment away the SaveSample call in your code and rerun your test to check the test time. Otherwise the HDF5 solution will work as well.


39: [ RUN      ] TestWrite.MdfConverter
39: Write Time (2MS) [s]: 2.07875
39: [       OK ] TestWrite.MdfConverter (2979 ms)
39: [2024-07-17 16:06:58.908] [Trace] Tear down the test suite. [testwrite:void __cdecl mdf::test::TestWrite::TearDownTestSuite(void):137]
39: [----------] 1 test from TestWrite (2980 ms total)
39: 
39: [----------] Global test environment tear-down
39: [==========] 1 test from 1 test suite ran. (4064 ms total)
39: [  PASSED  ] 1 test.
ihedvall commented 1 month ago

Don't misunderstand me. The solution with HDF5 as output is much better if it is acceptable. The read time of signal values from HDF5 will be much faster than reading from an MDF file. The conversion time should be in the same range, though. The MDF4 is good for long term storage and for logging (appending) values.

The HDF5 is made for reading and analyzing values. There is no standard way of storing meta-data i.e. information about the test object/environment. A database should be used for indexing if there is a lot of HDF5 files.

ihedvall commented 1 month ago

@Ritzermaxl I also got slow write response when I tested the mdf2csv application. The problem was that the output file path was a network file path (NAS). My unit tests are done against a local SSD drive (temp path). It's a huge different (100 times faster or so). General rule is to create HDF5/MDF4 locally (in temp folder) and then copy the file to destination. Just if you have the time to check.