MDAL doesn't load binary datasets if mesh number of nodes differs from maximum node ID

wilhelmje commented 2 years ago

For both, binary and ascii dataset files the index (position/line) of the values correspond to the node numbers in the mesh. In most cases nodes are numbered from 1 to maximumId. But this is not mandatory. There may be reasons to remove some nodes from a mesh without renumbering, having the ability to use existing datasets. If I use ascii format this is no problem in most cases, because the validation of the dataset is made by comparision of the value behind the ND-card and the maximum ID of mesh nodes (see mdal_ascii_dat.cpp, MDAL::DriverAsciiDat::loadNewFormat/loadOldFormat):

    if ( cardType == "ND" && items.size() >= 2 )
    {
      size_t fileNodeCount = toSizeT( items[1] );
      size_t meshIdCount = maximumId( mesh ) + 1;
      if ( meshIdCount != fileNodeCount )
      {
        MDAL::Log::error( MDAL_Status::Err_IncompatibleMesh, name(), "Loading old format failed" );
        return;
      }
    }

In binary format (mdal_binary_dat.cpp, MDAL::DriverBinaryDat::load) this doesn't work, because the comparision is made between the number of values in the dataset (numdata) and the number of nodes in the mesh (vertexCount).

      case CT_NUMDATA:
        // Num data
        if ( read( in, reinterpret_cast< char * >( &numdata ), 4 ) ) return exit_with_error( MDAL_Status::Err_UnknownFormat, "unable to read num data" );
        if ( numdata != static_cast< int >( vertexCount ) ) return exit_with_error( MDAL_Status::Err_IncompatibleMesh, "invalid num data" );
        break;

I think both dataset formats should use the same criteria and for more flexibilty it should be the first one (the maximum node id of the mesh). This would solve the problem of gaps in node numbering. But there may be cases where the node with the maximum ID should be removed from the mesh. In this case loading of datasets would not work, because the criteria is meshIdCount != fileNodeCount and would cause an error. So the only really mandatory check should be: meshIdCount > fileNodeCount where no values can be found in the datasets file for ID greater than fileNodeCount.

A second barrier in file formats can be the number of elements, if stored in datasets files (NC card / CT_NUMCELLS ). Flexibility would rise if this would be ignored completely for vertex based datasets.

I'm aware that the implemented behavior is intended to avoid users from combining mismatching data. I think, the user would recognize this immediately, viewing the results and the benefit of these checks would be less than the higher flexibility and the ability to create clipped TIN-Meshes (e.g. for watersurface elevation).

vcloarec commented 2 years ago

Hi, The problem is that MDAL do not rely dataset value to node with node ID, but with position in the nodes (vertices) list.

vcloarec commented 2 years ago

hmmm, my bad... after looking closer, when the mesh is a 2DM mesh, there is a link between ID and position... I have to look again closer.

vcloarec commented 2 years ago

I think, the user would recognize this immediately, viewing the results and the benefit of these checks would be less than the higher flexibility and the ability to create clipped TIN-Meshes (e.g. for watersurface elevation).

Thinking about, I am not sure the user can see always there is a problem. For example, with QGIS, in editing mode, if the user remove only one vertex, then stop edit. In that case, there will be a renumbering. If the user reload associated dataset. Then there will be an offset of one value... Not sure the sure can see it on screen... So, I think there is a non negligible risk of confusion if we allow to have a different count of elements than count of values.

lutraconsulting / MDAL

MDAL doesn't load binary datasets if mesh number of nodes differs from maximum node ID #417