nschloe / meshio

:spider_web: input/output for many mesh formats
MIT License
1.92k stars 396 forks source link

ENH: cells_data #685

Closed gdmcbain closed 4 years ago

gdmcbain commented 4 years ago

Since the .cells were split into a List[Cells] in #631, there has been the possibility of having more than one of the Cells having the same Cells.type. This was indeed the purpose.

It has to be assumed that having more than one Cells of the same Cells.type will be done for some reason in the application, e.g. the cells belong to a subdomain (material) or patch of boundary in a mixed boundary value problem (though see #684 and nschloe/pygmsh#301 where one of the types, 'line', has been split arbitrarily by a third-party code, Gmsh).

Assuming that the Cells are physically significant, the application will likely associate data with each, e.g.:

Should meshio.Mesh be enhanced to store data associated with Cells directly?

Thus instead of

mesh.cell_data["gmsh:physical"] == [array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2])]

something like

mesh.cells_data["gmsh:physical"] = [1, 1, 1, 2]

and for #676

mesh.cells_data["obj:group"] = [0, 1, 2, 3]

(These keys differing of course only pending the unification of #619.)

nschloe commented 4 years ago

Traditionally, cell_data is data for every single cell. If all cells in a group have the same datum, we can certainly admit simplification here. In the gmsh example, I wouldn't go so far though. The file adds an int to every cell, and meshio should reflect that. There maybe other formats which truly associate one value with a group.

Another option: Add a cell_group_data field for this.

I think this issue isn't terribly pressing though.

gdmcbain commented 4 years ago

The file adds an int to every cell, and meshio should reflect that.

No, MSH 4.1 assigns the physical group to the entity block, not to the individual cell; OBJ assigns a group name to the g, not to each f. The individual cells only get the physical group or group name via their membership of the entity block or group.

I think this issue isn't terribly pressing though.

Sure, it can be worked around, but what is the advantage of the new .cells : List[Cells] restructure is if the items can't be tagged in any way? To reconstruct a MSH 4.1 block or OBJ group for writing, would one have to check that each cell in a Cells has identical cell_data? (I don't think meshio.write currently attempts to group cells for either MSH 4.1 or OBJ.)

tianyikillua commented 4 years ago

But for other formats we don't necessarily have the same region id for each cells block. Another reason for a list based cell blocks structure is to preserve cell ordering in the original mesh file.

keurfonluu commented 4 years ago

My two cents. This is too specific as it seems to only benefit OBJ format and cannot be generalized to other use cases of cell data (e.g. physical properties like pressure, temperature...).

I think we might first need to find a way to unify these annoying cell groups (gmsh:physical, flac3d:zone, avsucd:material...) and an easy workaround might pop up to solve this issue for OBJ?

gdmcbain commented 4 years ago

I'm not actually interested in OBJ, it's not a very useful format for finite element work, it was just a simple example that came up recently #676.

MSH 4.1 is more important because that's what Gmsh writes and what pygmsh ends up using internally by default.

gdmcbain commented 4 years ago

This is too specific … and cannot be generalized to other use cases of cell data (e.g. physical properties like pressure, temperature...).

Yes, I think you're onto something here. Currently ‘region id‘ and temperature are handled in the same way but they are intrinsically quite different things. For one the latter is float while the former is int (or uint, str, Enum, …).

Does this suggest that cell_data isn't the ideal way to store region id?

gdmcbain commented 4 years ago

But for other formats we don't necessarily have the same region id for each cells block.

Oh. Could you point to an example?

Another reason for a list based cell blocks structure is to preserve cell ordering in the original mesh file.

Why would that matter? (Assuming everything were appropriately tagged.) Example? Finite element meshers and solvers reorder nodes and cells all the time.

gdmcbain commented 4 years ago

I think we might first need to find a way to unify these annoying cell groups (gmsh:physical, flac3d:zone, avsucd:material...)

Yes ! (Another: "medit:ref".) It really is too much to expect downstream code to handle all that.

I think this is part of #619 and might usefully be split off from it to increase its priority. Also the other part of #619 in cell_sets is perhaps more technical or delicate—by which I only mean that I don't claim to fully understand its implications for formats that I don't use. Or, on the other hand, is treating #619 as a single issue the way forward? Are cell_sets involved?

keurfonluu commented 4 years ago

But for other formats we don't necessarily have the same region id for each cells block.

Oh. Could you point to an example?

AVS-UCD, for instance, defines a cell material in the second column:

1  1  quad  1  2  3  4
2  1  quad  1  5  6  2 
3  1  quad  2  6  7  3
4  2  quad  3  7  8  4
5  2  quad  4  8  5  1
6  1  quad  5  8  7  6

Another reason for a list based cell blocks structure is to preserve cell ordering in the original mesh file.

Why would that matter? (Assuming everything were appropriately tagged.) Example? Finite element meshers and solvers reorder nodes and cells all the time.

I never thought that original order would matter, but after re-adapting the FLAC3D and AVS-UCD formats to the new style, I can definitely say that not to have to worry about the cell ordering when reading a file made my life easier. It's something.

gdmcbain commented 4 years ago

An alternative to the new Mesh attribute suggested in the title (assuming Cells do have some physical or semantic significance, though @tianyikillua is suggesting that that's not always the case): give each Cells another attribute besides type and data. This might be a str for a name or an int for a region_id. These could be left empty (empty string, 0) (or perhaps even omitted?) if unused or irrelevant, but otherwise would suffice to map the behaviour of MSH 4.1. (Or OBJ, but I won't mention that again as I'm unlikely to use the format; VTK was the better option in the OpenFOAM triSurface application mentioned in #676.)

gdmcbain commented 4 years ago

AVS-UCD, for instance, defines a cell material in the second column:

Right. There are lots of formats like that: MSH 2.2, MEDIT, …. But I think that it's by now generally recognized that that's a terrible way to manage regions. Much better to either:

Of course meshio has to handle such legacy formats but I'd like to think that it would also try to support better practices.

tianyikillua commented 4 years ago

For me a region can be composed of several different cell types, while our each cell block is always defined by a unique cell type.

Consider an unstructured mixed triangle-quad mesh of the following domain. Each region is thus composed of triangular and quad cells.

Screenshot 2020-02-12 at 20 30 41
gdmcbain commented 4 years ago

Good point, I hadn't thought of heterogeneous meshes, but in which case wouldn't one simply assign the same region_id to each cellblock constituting the region? In the same way that in the AVS UCD example of @keurfonluu (or in MSH 2.2 or MEDIT) one assigns that region_id to each cell constituting the region? It's just working at a higher level, cellblocks rather than cells, where that makes sense in.the semantics of the application.

tianyikillua commented 4 years ago

For me the transition from a cell dict to a cell list mainly reflects the data structure of each mesh format, and not for representing regions. In many mesh formats cells are defined by consecutive blocks composed of a same cell type, period...

gdmcbain commented 4 years ago

O. K., so data or metadata should never be associated with the Cells items of Mesh.cells, no mechanism for this provided.

I agree that it is a good design principle to only have a feature try to do one thing, so here cell blocks should not be used for anything semantic, only syntactic like homogeneity of cell type.

In which case, cellblocks aren't good for representing MSH 4.1 blocks, OBJ groups, (many other examples), which do have data or semantic metadata associated with them. Should other ways to represent these in meshio.Mesh be sought? Maybe the cell_sets attribute? What do you suggest? Consider the OBJ group names as a fairly unimportant but simple example. (These are currently discarded.)

gdmcbain commented 4 years ago

In the gmsh example, I wouldn't go so far though. The file adds an int to every cell, and meshio should reflect that. There maybe other formats which truly associate one value with a group.

Yes, so as mentioned above, MSH 4.1 does associate the int to the group; this was an advance on MSH 2.2 which associated the int to each cell. Other examples : OBJ group names #676. (I believe these are interpreted by OpenFOAM as patches of boundary for mixed boundary value problems—inlet, outlet, wall, that kind of thing.) Others too I'm sure, but I'd have to look that up later.

Another option: Add a cell_group_data field for this.

As an attribute of meshio.Mesh? Right, that's what I meant by cells_data in the title of this issue. I'm not fussed either way about the name. Is this what you meant? There does seem to be some strong opinion that the Cells items of .cells are not to be given any physical significance though…

gdmcbain commented 4 years ago

A recap on how regions are specified in MSH 4.1 (as opposed to the long-lived and still popular MSH 2.2): first there's an $Entities section

$Entities
  numPoints(size_t) numCurves(size_t)
    numSurfaces(size_t) numVolumes(size_t)
  pointTag(int) X(double) Y(double) Z(double)
    numPhysicalTags(size_t) physicalTag(int) ...
  ...
  curveTag(int) minX(double) minY(double) minZ(double)
    maxX(double) maxY(double) maxZ(double)
    numPhysicalTags(size_t) physicalTag(int) ...
    numBoundingPoints(size_t) pointTag(int) ...
  ...
  surfaceTag(int) minX(double) minY(double) minZ(double)
    maxX(double) maxY(double) maxZ(double)
    numPhysicalTags(size_t) physicalTag(int) ...
    numBoundingCurves(size_t) curveTag(int) ...
  ...
  volumeTag(int) minX(double) minY(double) minZ(double)
    maxX(double) maxY(double) maxZ(double)
    numPhysicalTags(size_t) physicalTag(int) ...
    numBoundngSurfaces(size_t) surfaceTag(int) ...
  ...
$EndEntities

which assigns numPhysicalTags physical tags to each ‘entity’ which is of a certain dimension (0, 1, 2, or 3), then the $Elements section

$Elements
  numEntityBlocks(size_t) numElements(size_t)
    minElementTag(size_t) maxElementTag(size_t)
  entityDim(int) entityTag(int) elementType(int; see below)
    numElementsInBlock(size_t)
    elementTag(size_t) nodeTag(size_t) ...
    ...
  ...
$EndElements

which defines blocks of elements, each of a given dimension and belonging to a given entity of that dimension. Elements within these blocks get no data, just an elementTag identifier and a list of nodeTags identifying their support.

Elements are assigned to regions (subdomains, boundary patches, &c.) by belonging to a block which is assigned to an entity which has a number of physical tags.

gdmcbain commented 4 years ago

It appears from #646 that Fluent .msh (meshio input format "ansys") divides its cells (‘faces’) into blocks or groups or zones and gives to each:

as well as an 'element-type' (I refer here to the meshio.ansys source code comments).

I believe the name and zone-id are currently discarded (the latter would be a[0] in _read_faces) and the cells turned into meshio cellblocks with a possible loss of semantics.

nschloe commented 4 years ago

Okay, my thoughts.

gdmcbain commented 4 years ago

Thank you for all the expert inputs. I'll withdraw this suggestion. Conclusion: the items of meshio.Mesh.cells are merely syntactic and should not have data associated with them; if the blocks in a mesh file format are represented as items of .cells but do have data (e.g. the group names in Wavefront .obj and Fluent .msh, the physical tags in Gmsh MSH 4.1), that data is to be represented elsewhere in the meshio.Mesh data structure: .cell_data, .cell_tags #691, or .cell_sets #698, with the latter being preferred. I'll push on with reading .cell_sets from MSH 4.1 #698.