ihedvall / mdflib

Implementation of the ASAM MDF data file.
https://ihedvall.github.io/mdflib/
MIT License
57 stars 25 forks source link

workaround for std::tmpfile() returning nullptr o Windows #45

Open MichalLeonBorsuk opened 7 months ago

MichalLeonBorsuk commented 7 months ago

Plus "0ULL" changed to static_cast, as C++20 interpretations differbetween Windows and Linux

PS Please excuse the line formatting, I applied clang-tidy on some files.

MichalLeonBorsuk commented 7 months ago

Also, find_if() seems to be C++-20, not 17, so I changed the version in CMake.

Simplxss commented 7 months ago

Notice: size_t is different in some architectures.

// In windows
#ifdef _WIN64
    typedef unsigned __int64 size_t;
#else
    typedef unsigned int     size_t;
#endif

// In linux
#define __SIZE_TYPE__ long unsigned int
   typedef __SIZE_TYPE__ size_t;

I recommand to use a fix length integer like uint32_t or uint64_t instead of size_t.

ihedvall commented 7 months ago

I will add a lot of changes when the bus logging writing is ready, I have the C# interface left. I anyone can figure out why I as a repository owner, cannot create a fork/branch in GitHub!

Going back to C++ 20 is a problem for Murmele. Maybe it's OK for the time being and take the problem when Murmele actually use it.

Note that this implementation of MDF reading uses a lot of primary memory. The previous version of this library was designed for 32-bit Windows which have a limit of 2G of primary memory. Sure enough someone wanted to read in a >4G file. The solution was to cache the observers data buffers into files. This was truly a real mess.

Just curious about the WebAssembly support. Is it possible to read a file from a WebAssembly?

NoelBruno commented 7 months ago

Plus "0ULL" changed to static_cast, as C++20 interpretations differbetween Windows and Linux

PS Please excuse the line formatting, I applied clang-tidy on some files.

I am facing same trouble of null pointer with mingw / cygwin. I have modified the code to create a string variable "data_file_name" storing a file name in the current directory with the std::tmpnam() function, excluding root character '/' or '\' as first character.

Here is the modified function:

void Dg4Block::ReadData(std::FILE* file) const { if (file == nullptr) { throw std::invalid_argument("File pointer is null"); } const auto& block_list = DataBlockList(); if (block_list.empty()) { return; }

// First scan through all CN blocks and read in any SD VLSD data related data // bytes into memory.

for (const auto& cg : cglist) { if (!cg) { continue; } const auto channel_list = cg->Channels(); for (const auto channel :channel_list) { if (channel == nullptr) { continue; } const auto cn_block = dynamic_cast<const Cn4Block>(channel); if (cn_block == nullptr) { continue; } const auto data_link = cn_block->DataLink(); if (data_link == 0) { continue; // No data to read into the system } const auto block = Find(data_link); if (block == nullptr) { MDF_DEBUG() << "Missing data block in channel. Channel :" << cn_block->Name() << ", Data Link: " << data_link;

    continue; // Strange that the block doesn't exist
  }

  // Check if it relate to a VLSD CG block or a SD block
  if (cn_block->Type() == ChannelType::VariableLength &&
      block->BlockType() == "CG") {
    const auto* cg_block = dynamic_cast<const Cg4Block*>(block);
    if (cg_block != nullptr) {
      cn_block->CgRecordId(cg_block->RecordId());
    }
    // No meaning to read in the data as it is inside this DG/DT block
    continue;
  }
  cn_block->ReadData(file);
}

}

// Convert everything to a samples in a file single DT block can be read // directly but remaining block types are streamed to a temporary file. The // main reason is that linked data blocks not is aligned to a record or even // worse a channel value bytes. Converting everything to a simple DT block // solves that problem.

bool close_data_file = false; std::FILE data_file = nullptr; size_t data_size = 0; std::string data_file_name ; if (block_list.size() == 1 && block_list[0] && block_list[0]->BlockType() == "DT") { // If DT read from file directly const auto dt = dynamic_cast<const Dt4Block*>(block_list[0].get()); if (dt != nullptr) { SetFilePosition(file, dt->DataPosition()); data_file = file; data_size = dt->DataSize(); } } else { close_data_file = true;

data_file_name = std::tmpnam(NULL) ;
if (data_file_name.size() > (size_t)0)
  {
if ((data_file_name.front() == '\\') || (data_file_name.front() == '/'))
  {
    data_file_name.erase(data_file_name.begin()) ;
  } /* data_file_name.front() == '\') || (data_file_name.front() == '/') */
  } /* data_file_name.size() > 0 */
data_file = fopen(data_file_name.c_str(), "wb+") ;
if (data_file == NULL)
  MDF_ERROR() << "Dg4Block::ReadData: unable to open temporary file "
      << data_file_name << std::endl ;

data_size = CopyDataToFile(block_list, file, data_file);
std::rewind(data_file);  // SetFilePosition(data_file,0);

}

auto pos = GetFilePosition(data_file); // Read through all record ParseDataRecords(data_file, data_size); if (data_file != nullptr && close_data_file) { fclose(data_file); remove(data_file_name.c_str()) ; }

for (const auto& cg : cglist) { if (!cg) { continue; } for (const auto& cn : cg->Cn4()) { if (!cn) { continue; } cn->ClearData(); } } }

ihedvall commented 7 months ago

Maybe it is better for everyone, that the code was rewritten so it can read the file without opening a temporary file. Faster but little bit spagetti code, well maybe not so little.

ihedvall commented 6 months ago

I have made version 2.1 of the MDF library. It includes the Channel Array (CA) support as well as fix for the nonaligned numbers as 6-bit signed numbers. These changes affects the iChannel/Cn4Block files why a merge to the fork is difficult to perform.

The temp file function returning null, has never occurred in Windows 11 64-bit and in Ubuntu 64-bit. I have added so the unit test are run in Windows 11 and in Ubuntu in GitHub action. I also have the possibility to test on MacOS but there are some issue with the XCode compiler. MacOS/CMake using an old Clang compiler just before C++20 which cause problem for all my libraries.

It's a problem to test all OS and compiler combinations with the GitHub actions due to the fact that it stops at first error, so it takes a long time to find all errors. Most compiler errors are easy to fix typical Clang/Gnu/VC compiler issues. If anyone have a compiler/OS combination that is possible to test the library + tools before check-in to the GitHub (action) and do a build YML file (see existing yml file). I recommend that avoid the matrix parallel build as it seems to cancel each other if something fails.

For the tmpfile problem, I suggest to move the code snippets to the MdfHelper or Platform class and add have all #ifdef there. Then write a unit test that stress test that code snippets in all environments.

ihedvall commented 6 months ago

This pull request needs to be synchronized with main branch but this will be tricky as there is overlapping fixes.

I do have a proposal to somewhat solve the tmpfile problem. The extra file is needed to solve the problem when the "DT" block is split or compressed.

  1. If it is an uncompressed single DT block, use current read record by record from the input file. This minimize the memory consumption and reads the file quite fast.
  2. If not a single data block (HL/DL/DZ/DT), then try to allocate the whole required data bytes into memory and copy the bytes from file to memory.
  3. If 2 fails, then use the current strategy with temporary file.

Maybe step 1 can be skipped. This will speed up the reading somewhat.

Any ides ?

ihedvall commented 6 months ago

FYI I will do a ReadCache class that solves the problem with tmpfile() i.e. it reads directly from the input file. This is primary due to long read times for large files (> 10G). I will do this on the main branch. Someone else is working on the emscripten (WASM) solution.

ihedvall commented 6 months ago

I have checked in the Read Cache functionality for MDF 4 files. It solves the tmpfile() problem as well as improving the read speed, quite a lot actually. Most of the above changes are obsolete. I don't if a merge is possible/needed at this time.

Please let me know if you need my help with the merge.