Mesh.save subdomains and boundaries

gdmcbain commented 4 years ago

Should Mesh.save save the .subdomains and .boundaries attributes? In a way that they might be reread by Mesh.load.

If I/O is to continue to rely on meshio.write and meshio.read (as it should), I think that saving boundaries requires saving cells of type bnd_type (e.g. 'line' for MeshQuad); currently skfem.Mesh.save only saves cells of type Mesh.meshio_type.

(I'm currently parsing a polyMesh from OpenFOAM and constructing an skfem.Mesh. My parser is very slow so I want to be able to save the mesh while experimenting with the solver.)

Perhaps instead for the moment I should just construct a meshio.Mesh and meshio.write that, leaving scikit-fem out of it, but this might be something to think about for other cases, e.g. when the mesh has been refined or adapted inside scikit-fem.

kinnala commented 4 years ago

I think this shouldn't be a huge task to implement. I'll keep it in mind for future coding sessions.

gdmcbain commented 4 years ago

Not a huge task to implement, no, but maybe there is a preliminary question to think about: ‘tagged subsets of elements’ nschloe/meshio#290.

The current skfem.Mesh.load looks for a key 'gmsh:physical' in the meshio_type and bnd_type entries of the .cell_data attribute of the meshio.Mesh. These are interpreted as one-dimensional integer arrays of length equal to the number of meshio_type and bnd_type cells specified in the meshio.Mesh.cells.

However, the .subdomains and .boundaries attributes of a skfem.Mesh are different; they're Dict[str, ndarray], with the terms of the ndarray values being indices of meshio_type or bnd_type belonging to the subdomain or boundary.

These two kinds of representations:

{cell} → {subdomain}
{subdomain} → {subset of cells}

are discussed in nschloe/meshio#175, nschloe/meshio#290, nschloe/meshio#347, …

The current behaviour of meshio.read is mostly the way it is because at the time it was developed, the only format in which meshes were generated with subdomains and boundaries was Gmsh's MSH 2 and that used the {cell} → {subdomain} representation. Gmsh has since released a new format, MSH 4.1 #41 #129, which I think is actually supposed to work the other way, {subdomain} → {subset of cells}, although I haven't fully understood its concept of 'entity'. But when Gmsh suddenly started generating MSH 4.1 by default which nothing but itself could read, I hastily added a facility to meshio to read it, but to more or less coerce it into the same data-structure as previously used for MSH 2, This involves conversion from the second representation to the first, lossily, which leads to problems like nschloe/meshio#524 when subdomains or boundaries aren't mutually exclusive (something which makes sense in 'multiphysics' problems).

So I'm wondering whether it might be better to avoid that conversion and see whether (already or in the future) meshio can hold and write something closer to .subdomains and .boundaries. But then the problem might be that either meshio won't reread those files as having subdomains or boundaries or won't put them into the (unfortunately named) 'gmsh:physical' entries which will mean that they aren't recognized by skfem.importers.meshio.from_meshio.

gdmcbain commented 4 years ago

That Mesh.external isn't populated on loading #268 means that that can't be used to reread the subdomains and boundaries (assuming they're not unravelled back into the 'gmsh:physical' representation).

gdmcbain commented 4 years ago

In recent work reading and writing XDMF nschloe/meshio#599 nschloe/meshio#600, I noticed a couple of interesting features of the format that might be useful here:

<Grid GridType="SubSet">: These might be able to store skfem.Mesh.subdomains.
<Set>
- Type="Cell": Another way to store subdomains?
- Type="Face": For skfem.Mesh.boundaries?

I don't think either are supported in meshio yet, but that can be fixed.

Other advanced mesh formats (MED, Exodus II, Silo, …?) might have something similar but I'm less familiar with them.

I think the XDMF Face corresponds to a scikit-fem facet and is indexed as the facet of a cell rather than in a global array of facets, but that can be worked with via the .f2t and .t2f attributes of a skfem.Mesh. This might mean (except perhaps on initial import from Gmsh MSH 2) dispensing with separate lower dimensional elements for boundaries.

https://github.com/kinnala/scikit-fem/blob/e77096ea9c6d424ab12021fbc5370dca36db101b/skfem/importers/meshio.py#L91

(I've been pondering a comment in nschloe/meshio#571: ‘The devil invented mixed topologies.’)

kinnala commented 4 years ago

Currently there is the possibility to use skfem.io.json which saves boundaries and named subdomains. Does it cater your needs or do you consider it necessary to save these also to external formats?

gdmcbain commented 4 years ago

I was thinking of an external format, yes:

large meshes (millions of elements are already quite feasible in the solver)
- Would JSON be big?
- Would JSON be slow?
- I didn't test that but assumed that something based on e.g. HDF5 or netCDF4 would be better.
Visual checking: open in Gmsh, ParaView, or similar to see the subdomains or boundaries. The latter could be tricky: I don't know how much support for facets is out there.
Interoperability with other packages
- reproducibility and cross-validation
- coupled or hierarchical multiphysics systems

I think this is a desirable long-term goal.

I made efforts on the subdomains part in XDMF but met intransigence in nschloe/meshio#622. I haven't completely abandoned that proposal but wearying of the repeated rejection did experiment with pyexodus; that was just preliminary but did seem promising.

gdmcbain commented 4 years ago

I did have a different idea for this a while ago, if existing external formats proved too difficult. Leaning on another part of the Python standard library than JSON, I was looking at xdrlib, the idea being something as isomorphic as possible to skfem.io.json but faster and with smaller files.

I got this idea from (libMesh/External packages/Utilities).

The XDR: External Data Representation Standard is a standard for the description and encoding of data. It is useful for transferring data between different computer architectures, and as such provides a very simple, portable approach for writing platform independent binary files. libMesh uses the Xdr class to provide a uniform interface to input/output operations for files in either XDR binary or ASCII text formats.

A minor hindrance with XDR and JSON is that they don't support numpy.ndarray; it's not part of the standard library.

gdmcbain commented 4 years ago

The thinking with pyexodus was probably not to wrap it for scikit-fem but to use it to learn how to save subdomains and boundaries (if indeed the Exodus format permits that) and then use the resulting Exodus files as examples for meshio, to see how much its Exodus writer and reader need extending.

gdmcbain commented 3 years ago

The comment in #556 yesterday was:

261; I plan to deprecate our in-house JSON format and try to use meshio instead.

So as well as meshio, does that imply an existing file format? meshio no native serialization format. It does have sort of an internal data structure but it's evolving and not completely uniform with respect to the supported input and output formats.

The .subdomain attribute isn't too bad as it can be encoded as a dict of names with values either (i) bool-valued cell data or (ii) a list of cells belonging. (Or, more rudimentarily, as int-valued cell data as in MEDIT, Gmsh's MSH 2.2, &c.)

The .boundaries attribute is trickier. Gmsh stores boundaries as separate lower dimensional cells and FEniCS actually wants two input mesh files (one for the domain cells, one for the boundary cells). skfem.Mesh records .boundaries as lists of facets of domain cells; this is more natural (in my opinion) but not all that well supported by external mesh file formats. Not MEDIT or Gmsh's MSH, maybe Exodus, MED, XDMF, ...? My impressions so far for these were that XDMF doesn't look all that well maintained (incomplete inconsistent specification, the main reference implementation is ParaView which is a viewer and so has different priorities to a finite element library), MED was overly bureaucratic (despite comprehensive documentation and maintenance), leaving perhaps Exodus as the most promising.

The main hurdle with Exodus is that I don't think meshio currently supports the its features that are needed here (see, e.g., nschloe/meshio#1031). I think it might be possible to get the required features into meshio, despite that having failed for XDMF; I think the Exodus specification is less ambiguous, which is what went wrong in nschloe/meshio#622. There is also pyexodus and the possibility of reading Exodus directly as HDF.

Of course, whether meshio is used or not, relying on Exodus would introduce a dependency on h5py or similar.

Are there other options that I've overlooked?

kinnala commented 3 years ago

Oh yes, I had forgotten this discrepancy.

skfem.Mesh records .boundaries as lists of facets of domain cells; this is more natural (in my opinion) but not all that well supported by external mesh file formats.

The missing support is probably because there is not a unique way of mapping internal elements to facets. We have one implementation in scikit-fem but other codes might do it differently and some additional details must be fixed to have a well defined format.

So again it comes back to inventing our own format.

kinnala commented 3 years ago

Now that the new BaseMesh class in #556 will be "immutable", there should be no issues of simply dumping the contents into HDF5 directly using h5py (thus inventing our own format). That doesn't help with interoperability but should solve the issue of large size of JSON files.

gdmcbain commented 3 years ago

The missing support is probably because there is not a unique way of mapping internal elements to facets.

True, but also true of the ordering of nodes in some elements , which is handled as in

https://github.com/kinnala/scikit-fem/blob/de01235e606cf1475553c0727be0192f108d6d65/skfem/io/meshio.py#L66-L67

The XDMF spec says:

Edge and Face values are defined for each Cell in turn, using VTK ordering.

That's for an Attribute with Center="Face" but I would assume also applied to a Set of SetType="Face".

(I didn't know VTK had such an ordering. That might be worth looking into too.)

gdmcbain commented 3 years ago

(I didn't know VTK had such an ordering. That might be worth looking into too.)

I had a bit of a look for whether VTK had a way of serializing data on facets but didn't find anything in:

gdmcbain commented 3 years ago

Some thoughts provoked by the difficulties encountered in using meshio in #680:

I wonder whether we might do better by avoiding meshio's cell_sets entirely and encoding our subsets, .boundaries, and .subdomains, with indicator functions, i.e. a bool or 0/1 for inclusion. Pass these as cell_data, named like "boundary:left". This isn't optimally efficient in terms of storage but might be manageable. (Do we need to prepend this more? Like "skfem:boundary:left"?)

Extension of that idea: some mesh formats (notably XDMF—see xdmf#15 nschloe/meshio#568, nschloe/meshio#622) don't like storing more than one kind of element. Could we get away from Gmsh's idea of representing boundary facets as lower-order elements and instead in the cell_data for each boundary store an int array with an int for each domain cell, being the index of the facet that is on the boundary if there is one and otherwise a sentinel value (e.g. the number of facets, since that is not a valid index to the element–facet connectivity matrix, .t2f.

Would such a scheme:

Be easily exported in almost any mesh file format? It would have only to support points, a mesh containing a single kind of cell, and named data defined on those cells
Be readily rendered by available tools, e.g. ParaView? Each subdomain and boundary should show up coloured. Each boundary will be mottled but the sentinel should be distinct; presumably the colouring could be chosen by a threshold separating the sentinel.
Be easily reconverted to the internal format of skfem.Mesh? using np.nonzero, &c.

How does this compare with #680 and its objectives? I'm not exactly sure what those are. Shall I develop this idea?

Actually I'm not even sure that it needs support from the skfem library; I think it can be done at the user-level (although that's true of much functionality in scikit-fem, which is one of its great advantages).

gdmcbain commented 3 years ago

Could we get away from Gmsh's idea of representing boundary facets as lower-order elements and instead in the cell_data for each boundary store an int array with an int for each domain cell, being the index of the facet that is on the boundary if there is one and otherwise a sentinel value (e.g. the number of facets, since that is not a valid index to the element–facet connectivity matrix, .t2f.

No good: a cell can have more than one facet on a boundary.

kinnala commented 3 years ago

Let me add a remark that in #680 the call

m = MeshTet().with_boundaries({'left': lambda x: x[0]==0.0})                                     
m.save('test.msh', write_boundaries=True, sets_to_int_data=True, binary=False, file_format='gmsh22')

seems to produce completely reasonable results: Kuvakaappaus - 2021-08-05 11-19-56 It's just that reading the resulting test.msh causes the loss of the tags and adds spurious entries.

Here is the resulting msh file:

$MeshFormat
2.2 0 8
$EndMeshFormat
$Nodes
8
1 0.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00
2 0.0000000000000000e+00 0.0000000000000000e+00 1.0000000000000000e+00
3 0.0000000000000000e+00 1.0000000000000000e+00 0.0000000000000000e+00
4 1.0000000000000000e+00 0.0000000000000000e+00 0.0000000000000000e+00
5 0.0000000000000000e+00 1.0000000000000000e+00 1.0000000000000000e+00
6 1.0000000000000000e+00 0.0000000000000000e+00 1.0000000000000000e+00
7 1.0000000000000000e+00 1.0000000000000000e+00 0.0000000000000000e+00
8 1.0000000000000000e+00 1.0000000000000000e+00 1.0000000000000000e+00
$EndNodes
$Elements
17
1 4 2 0 0 1 2 3 4
2 4 2 0 0 4 6 2 8
3 4 2 0 0 3 4 7 8
4 4 2 0 0 3 4 2 8
5 4 2 0 0 2 3 5 8
6 2 2 0 0 1 2 3
7 2 2 0 0 1 2 4
8 2 2 0 0 1 3 4
9 2 2 0 0 2 3 5
10 2 2 0 0 2 4 6
11 2 2 0 0 2 5 8
12 2 2 0 0 2 6 8
13 2 2 0 0 3 4 7
14 2 2 0 0 3 5 8
15 2 2 0 0 3 7 8
16 2 2 0 0 4 6 8
17 2 2 0 0 4 7 8
$EndElements
$ElementData
1
"left"
1
0.0
3
0
1
17
1 -1
2 -1
3 -1
4 -1
5 -1
6 0
7 -1
8 -1
9 0
10 -1
11 -1
12 -1
13 -1
14 -1
15 -1
16 -1
17 -1
$EndElementData

kinnala commented 3 years ago

MSH 4.1 is really confusing because the nodes must be assigned to geometric entities with some dimension (gmsh:dim_tags). I suppose the idea would be to assign them to edges, points and such parts of the original geometry. Obviously the original geometry is missing when you are dealing with meshes only.

gdmcbain commented 3 years ago

Yes... One plan that i did have, but never tried, was to take a simple MSH 2.2 like that and ask Gmsh to update it to MSH 4.1. I hoped that that would shed light on what on Earth one is supposed to posit for entities.

gdmcbain commented 3 years ago

Another shortcoming with MSH 2.2 (and the simple integer tagging of cells in general, e.g. also in MEDIT) is the inability to handle overlapping subdomains or boundaries; e.g. consider a slight extension of your example:

m = MeshTet().with_boundaries(
    {
        "left": lambda x: x[0] == 0.0,
        "vertical": lambda x: np.logical_or(x[0] == 0.0, x[0] == 1.0),
    }
)

(e.g. for a free convection problem, this might be a plane vertical channel with zero-velocity on both walls and a Dirichlet temperature or applied heat flux on the left).

When I look at m.boundaries, it contains (correctly)

{'left': array([0, 4]), 'vertical': array([ 0,  4, 12, 13])}

but in the resulting MSH 2.2, the two subsets are combined

$ElementData
1
"left-vertical"
1
0.0
3
0
1
17
1 -1
2 -1
3 -1
4 -1
5 -1
6 1
7 -1
8 -1
9 1
10 -1
11 -1
12 -1
13 -1
14 -1
15 -1
16 1
17 1
$EndElementData

This can be avoided by the user never specifying overlapping subdomains or boundaries, e.g. here splitting "right" from "vertical" and applying the no-slip condition to both "left" and "right", but ideally the user wouldn't have to do it and it is a shortcoming of MSH 2.2, recognized by the Gmsh developers. If overlapping physical entities are specified in Gmsh (which is its native way of handling subdomains and boundaries, rather than this ElementData trick), it does something really unexpected when writing it as MSH 2.2: it duplicates the elements giving each coincident element one of the physical tags!

gdmcbain commented 3 years ago

Extending the example in nschloe/meshio#1165 with a third element of cell_sets, a subset of the second

cell_sets={
        'test': [array([], dtype=int64), array([1])],
        'sets': [array([0, 1]), array([0, 2, 3])],
        'subs': [array([1]), array([2, 3])],
    }

leaves only the set-difference in 'sets' after the round-trip:

'sets': [array([0]), array([0])],

while extending that third subset beyond the second

'subs': [array([1]), array([1, 2, 3])],

yields something entirely different:

{'set1': [array([0]), array([0])], 'set2': [array([1]), array([1, 2, 3])]}

kinnala commented 3 years ago

This is expected I think. That's how you'd also specify overlapping boundary conditions in ElmerFEM. Boundary is split into three: 1 for BC1, 2 for BC2 and 3 for BC1&BC2. Then in the simulation spec you set BC1 to tags 1&3 and BC2 to tags 2&3.

However, in this PR I plan to add yet another approach of using multiple cell_data keys. I don't know if MSH 2.1 supports that but it's worth a try.

Another thing I wan't to try is writing this gmsh:dim_tags so that MSH 4.1 export doesn't crash.

kinnala commented 3 years ago

Maybe an improvement to int_data_to_sets / sets_to_int_data would detect or deal properly with the overlap. Right now the user must be aware that there cannot be such overlapping cell_sets. But I won't push that because it's not needed in what I have in mind.

kinnala commented 3 years ago

There is also the question that where do these overlapping cell_sets originate from? In particular, if there are no importers in meshio that write overlapping cell_sets then that's probably not very high priority. If you can construct, e.g., inp-file which leads to overlapping cell_sets then it should be fixed.

kinnala commented 3 years ago

I now see that, e.g., Medit format is similar to Elmer format because it carries no tag names. It is possible to tag each edge with a number but there is no name associated with the number so therefore int_data_to_sets must invent some names.

kinnala commented 3 years ago

Now I'm looking into creating our own variant of sets_to_int_data which will support overlapping domains. Everything is encoded into a single cell_data entry for maximum support over different formats.

gdmcbain commented 3 years ago

There is also the question that where do these overlapping cell_sets originate from? In particular, if there are no importers in meshio that write overlapping cell_sets then that's probably not very high priority.

No, meshio reads MSH 4.1 fine and MSH 4.1 does describe subdomains and boundaries as subsets of cells. The meshio reader does populate .cell_sets directly in this case.

gdmcbain commented 3 years ago

I use overlapping subdomains and boundaries a lot. I usually generate the mesh in Gmsh, export as MSH 4.1, read with meshio, take into skfem, export results as XDMF. That's quite a good one way pipeline but it fails if i have to go backwards: the XDMF has forgotten the subdomains and boundaries.

gdmcbain commented 3 years ago

However, in this PR I plan to add yet another approach of using multiple cell_data keys. I don't know if MSH 2.1 supports that but it's worth a try.

2.2 or 4.1? But anyway yes, both support an arbitrary number of ElementData fields, corresponding to cell_data in meshio.

gdmcbain commented 3 years ago

Everything is encoded into a single cell_data entry for maximum support over different formats.

Intrigued!

So that was kind of achieved in #681 with the binary-encoding, but I didn't really think it through or test it very thoroughly. Intrigued to see what you've come up with. The state of mesh file formats is really disappointing!

kinnala commented 3 years ago

I'm planning to use products of prime numbers to distinguish the overlaps.

gdmcbain commented 3 years ago

Ha. I did think of that. I have done it before. Is it better than the binary? I thought the binary was more compact.

gdmcbain commented 3 years ago

Oh, well I was using it for the multiple facets of a cell on a boundary rather than a cell belonging to multiple subdomains, but yes.

kinnala commented 3 years ago

Now it's kind of working locally and it seems I can encode subdomains/boundaries to almost any format, even VTK is working even though Paraview is not visualizing the boundary mesh! How crazy's that.

kinnala commented 3 years ago

I'll clean it up a bit and push to #680

gdmcbain commented 3 years ago

it seems I can encode subdomains/boundaries to almost any format, even VTK is working

Yes this is a really neat idea. I suppose it's only possible because of meshio, approximating as it does a neutral format.

Being able to visualise subdomains and boundaries is a bonus. I wonder whether we can get that to work. Is it not working because of that new cellblock list for cells in meshio 4? Does that force each cell-data field to be defined on both domain and boundary cells?

kinnala commented 3 years ago

I think right now one should be able to open .msh file in Gmsh to visualize. I need to figure out how to do it in Paraview, maybe by writing point_data?

gdmcbain commented 3 years ago

Yes, that should work.

In #681, which used only domain-cells, no boundary-cells, the boundaries were sort-of depicted by colouring those cells having a facet on the boundary. If now in #680 you do have boundary cells and domain cells, do you have to provide cell data for the domain cells even for boundaries? At the moment is that just getting zeros or similar? If so, and you have to put something there anyway, could you put something like what went into the domain cell data for boundaries in #681? That worked particularly well in three dimensions because only the boundary facets are visible anyway.

kinnala commented 3 years ago

Domain cells are used to encode Mesh.subdomains and boundary cells are used to encode Mesh.boundaries.

gdmcbain commented 3 years ago

Of course, but isn't it a quirk of meshio 4 that all cell data fields have to de defined over all cells? So that even if you only want to encode a boundary over the boundary cells, you also have to provide values for that field over the domain cells too? (And vice versa for subdomains.)

kinnala commented 3 years ago

Yeah. Would you want to write cell_data for domain cells only if Mesh.subdomains is empty?

kinnala commented 3 years ago

Or maybe you are suggesting that we write multiple cell_data entries, one per each subdomain and each boundaries entry?

kinnala commented 3 years ago

Is this just for visualizing the boundaries? I mean, using point_data should do the trick right? So that we don't mix it up with tagging when using formats that support only one cell_data entry (like Elmer, Medit, AVS, ...)?

gdmcbain commented 3 years ago

Right, yes, i'm probably still thinking too much in terms of #681, sorry. I'll leave you to it and check back on Monday morning. It sounds like you're very close to fixing this which is great.

kinnala commented 3 years ago

Kuvakaappaus - 2021-08-06 14-14-52

kinnala / scikit-fem

Mesh.save subdomains and boundaries #261

261; I plan to deprecate our in-house JSON format and try to use meshio instead.