henry2004y / Batsrus.jl

BATSRUS/SWMF Data Processor
https://henry2004y.github.io/Batsrus.jl/dev/
MIT License
5 stars 2 forks source link

Description of how connectivity is handled in 3-D BATSRUS file #3

Closed rweigel closed 3 years ago

rweigel commented 3 years ago

We (@GaryQ-physics) are looking for a way to convert *.out files to VTK files with the appropriate connectivity.

Does this library do this?

henry2004y commented 3 years ago

Hi,

Do you mean the .out files generated from SWMF/BATSRUS? If so then this package has the capability. Please check the manual. If you find anything not working as you expect, please let me know and I will try to fix/improve on it!

rweigel commented 3 years ago

Yes, we are interested in the .out files. These are example files: http://mag.gmu.edu/git-data/sblake/SCARR5/GM/IO2/

Do you have an example of the use of https://github.com/henry2004y/Batsrus.jl/blob/master/src/convert2vtm.jl ? It appears that you use the information in the .tree file to create a multiblock VTK file.

rweigel commented 3 years ago

Regarding documentation, should

Convert 3D structured Tecplot data to VTK.

at https://github.com/henry2004y/Batsrus.jl/blob/master/src/io.jl#L874 be something like

Convert 3D unstructured IDL data to unstructured points VTK file

henry2004y commented 3 years ago

Regarding documentation, should

Convert 3D structured Tecplot data to VTK.

at https://github.com/henry2004y/Batsrus.jl/blob/master/src/io.jl#L874 be something like

Convert 3D unstructured IDL data to unstructured points VTK file

Yes you are right the comment is wrong. And as you can see I haven't implemented the one for unstructured grid, because that is the tricky one that requires more work.

I briefly scanned through the first .out file in the link. Unfortunately, this one is exactly an output from a 3D generalized coordinates. Currently the script you found convert2vtm.jl is able to generate vtm file like testvtm.tar.gz. (Just by changing nI, nJ, nK to 8 and use the correct file name)

However, this may not be ideal because tools like ParaView cannot understand what happens between blocks, since the connectivity between blocks is missing. This is indeed a big missing part for the current approach.

Honestly I don't know what is a nice approach to get the connectivity information between neighboring blocks. Do you have any idea?

henry2004y commented 3 years ago

Ok now I have some ideas on how to properly do this, but it may take several days. I need to dig into the tree structure and figure out the information.

rweigel commented 3 years ago

I've asked the developer of the BATSRUS module in SpacePy (Welling) about this and I'll let you know what he says.

henry2004y commented 3 years ago

I have now successfully converted 2D .out file to the unstructured vtk format. The next step is to generalize it to 3D. This should be doable.

However, there is one unexpected issue coming out when I was debugging in the final step. The point location stored in *.out are sorted in a certain way such that it is quite difficult to construct a mapping between connectivity and points. This is bothering me right now. I will try to think about how to deal with this issue.

GaryQ-physics commented 3 years ago

I have now successfully converted 2D .out file to the unstructured vtk format.

Is the code to you used for that in this repository? Also, I don't have any 2D files, only 3D, so would it be possible to provide a link to where I could download an example?

henry2004y commented 3 years ago

I haven't committed yet, because there are bugs in the code at AMR faces. Once it's done, I will commit it.

henry2004y commented 3 years ago

Finished! I will later commit the code and write down how to run it. For each 3D .out file of ~400 MB, it takes 120s on my laptop to do the conversion. grid_structure

henry2004y commented 3 years ago

I have committed the changes. It turns out to be more work than I originally expected: a whole week of continuous work. Check out the updated documents for usage, and feel free to reopen the issue if something is not working!

Cheers!

GaryQ-physics commented 3 years ago

I was able to successfully run

convertIDLtoVTK("3d__var_3_e20031120-070000-000", dir=".", gridType=-1, verbose=true)

where the files were taken from http://mag.gmu.edu/git-data/sblake/SCARR5/GM/IO2/

the output file 3d__var_3_e20031120-070000-000.vtu had the following header:

<?xml version="1.0" encoding="utf-8"?>
<VTKFile type="UnstructuredGrid" version="1.0" byte_order="LittleEndian" header_type="UInt64" compressor="vtkZLibDataCompressor">
  <UnstructuredGrid>
    <Piece NumberOfPoints="5896192" NumberOfCells="6057089">
      <Points>
        <DataArray type="Float32" Name="Points" NumberOfComponents="3" format="appended" offset="0"/>
      </Points>
      <Cells>
        <DataArray type="Int32" Name="connectivity" NumberOfComponents="1" format="appended" offset="11418166"/>
        <DataArray type="Int32" Name="offsets" NumberOfComponents="1" format="appended" offset="58524892"/>
        <DataArray type="UInt8" Name="types" NumberOfComponents="1" format="appended" offset="64923584"/>
      </Cells>
      <PointData>
        <DataArray type="Float32" Name="rho" NumberOfComponents="1" format="appended" offset="64929522"/>
        <DataArray type="Float32" Name="u" NumberOfComponents="3" format="appended" offset="82899200"/>
        <DataArray type="Float32" Name="e" NumberOfComponents="1" format="appended" offset="141583683"/>
        <DataArray type="Float32" Name="b" NumberOfComponents="3" format="appended" offset="160510648"/>
        <DataArray type="Float32" Name="b1" NumberOfComponents="3" format="appended" offset="222268904"/>
        <DataArray type="Float32" Name="p" NumberOfComponents="1" format="appended" offset="284449691"/>
        <DataArray type="Float32" Name="j" NumberOfComponents="3" format="appended" offset="303140883"/>
        <DataArray type="Float32" Name="status" NumberOfComponents="1" format="appended" offset="368641960"/>
      </PointData>
    </Piece>
  </UnstructuredGrid>
  <AppendedData encoding="raw">

I also checked the Points Array of the .vtu file. It has the same points as in the .out file, however they are in a different order.

I was under the impression that the points in the .out file represented the cell center points. If that is correct, then the "points" in the .vtu file should be different. They would be the vertices of cubes whose centers are at the original points. Also, that would mean the variables ("rho", "p", ect) would be CellData rather than PointData. Is this incorrect?

I also have strange effects when loading 3d__var_3_e20031120-070000-000.vtu into ParaView. For starters, when visualizing as a wireframe, I get large diagonal lines cutting through the blocks as shown:

grid_diag_lines

grid_diag_lines_zoomed

Finally, when using the CellSize filter to compute the cell volumes, I get many with nonsensical negative values:

negative_volumes

If you have any idea what is going on here, let me know.

And thank you for all the Assistance!

rweigel commented 3 years ago

@henry2004y - this is an impressive piece of work. We'll be sure to reference your repository in our project.

henry2004y commented 3 years ago

Bug fixed in v0.2.2, and then optimized in v0.2.3. I missed some points at block edges before writing the connectivity list, and that's why there were diagonal lines before. The negative cell volume was also a consequence of this bug. It may take a day or so for registering the new version into the Julia system: check the tag in the repo main page.

Regarding to the question of cell-centered or node-centered data, indeed this is often a confusing point. In the IDL format output, all the point coordinates and physical quantities are cell-centered. When I convert to unstructured VTK format, I keep the cell-centered points while constructing the connectivity list from the tree structure. This means that in the VTK world it is NOT unstructured cell data, but unstructured point data. In terms of visualization and postprocessing, we lose nothing. We can in principle save cell-centered physical quantities and node coordinates, which they call cell data, but we hardly gain anything by doing that. So be careful about the interpretation of "volume" given by Paraview: it is no longer equivalent to the simulation cell volume we use in BATSRUS.

Meanwhile, I made some micro optimization to the functions. Hopefully that will make it run faster: on my machine, the time it takes to generate one connectivity list reduces from 120s to 90s. Currently writing one .vtu from .out does not take advantage of parallel processing. If you want to speed it up given enough memory, you can try to process multiple files simultaneously in the outer loop over filenames, similar to the last example here. Note that it uses multiple cores; you can also try multiple threads.

Thanks for the feedback and kindness! I really hope open source project can benefit more for the science community.

GaryQ-physics commented 3 years ago

Ah ok. I upgraded versions and now it works, thanks!

rweigel commented 3 years ago

Thanks again for doing this!

So be careful about the interpretation of "volume" given by Paraview: it is no longer equivalent to the simulation cell volume we use in BATSRUS.

It turns out that we actually need precise cell volumes - we are using Biot-Savart in sub-regions to determine their contribution to magnetic perturbations on Earth's surface. The simulation only provides the contributions in aggregate, e.g., "magnetosphere", "fac region", and "ionosphere".

I apologize in advance if the following are naive questions and comments. I'm clearly not as familiar with the SWMF grid and VTK data models. (If you know of any resources about the SWMF grid, please let us know. We've dug around in the code and documentation and have not found much.)

To make sure that I understand your points made above, suppose the simulation had data points marked with * and associated area boundary nodes marked with o.

o         o          o                  o
     *          *
o         o          o        *
     *          *
o         o          o                  o
     *          *
o         o          o        *
     *          *
o         o          o                  o

One way of creating connectivity is with VTK_PIXEL cells

o---------o----------o------------------o
|    *    |     *    |                  |
o---------o----------o         *        |
|    *    |     *    |                  |
o---------o----------o------------------o
|    *    |     *    |                  |
o---------o----------o         *        |
|    *    |     *    |                  |
o---------o----------o------------------o

And another way, which you use, is

     *----------*
     |          |              
     *----------*--------------*
     |          |              |
     *----------*              |
     |          |              |
     *----------*--------------*
     |          |              
     *----------*

If I wanted an interpolated value, both methods would give the same answer. I think this is the justification for your statement that in terms of visualization and postprocessing, we lose nothing, and I agree.

I think that we could solve our problem about needing to know the volume associated with each point by simply associating an attribute with each point that corresponds to the resolution. It could be an integer where 1 => R_E, 4 => R_E/4, etc. It may also be useful to preserve the logical block number that each data point is in (this is something that @spacecataz mentioned to me earlier today). With these attributes, it seems that the .vtu file will contain all of the information that is in the .out + .tree files, and anyone that needed VTK_VOXEL connectivity (for a reason that I can't think of) could create it based on the information in the .vtu file.

One thing that I have not been able to make sense of is the NumberOfPoints="5896192" NumberOfCells="6057089" in the header of the .vtu file that Gary posted above. Based on the 2-D representations above, I would expect the number of cells to be less than the number of points.

rweigel commented 3 years ago

I think my last diagram is wrong. The positions of the two right-most points have been shifted.

GaryQ-physics commented 3 years ago

yes the last diagram should be more like

          *-------------------*----------________
          |                   |                  ------_____
          |                   |                  ______-----*                    
          |                   |      _______-----           |
          *-------------------*------                       |
          |                   |                             |
          |                   |                             |                    
          |                   |                             |
          *-------------------*----------________           |
          |                   |                  ------_____|
          |                   |                  ______-----*                    
          |                   |      _______-----
          *-------------------*------

Here is a screenshot of paraview I took, using a y=0 plane clip: clip_y=0_SurfaceWithEdges

henry2004y commented 3 years ago

The last scratch that Gary drew is conceptually correct. As for the output, there are two questions here:

  1. Why do we have more cells than points?
  2. Why are there duplicated cells?

Clearly there are some hidden facts that I am not aware of at the boundaries of different AMR levels, and the conversion produces erroneous result. I tested a little bit more and found out why there were duplicate rows (about 2000) in the connectivity list, but that shouldn't cause any issue presumably. In the Fortran source code of BATSRUS, even if you choose to directly output Tecplot data format, it will generate duplicate point under certain circumstances, so this is a "bug" inherited from BATSRUS. For my simpler test cases, the number of cells is always smaller than the number of points. I have no idea why the number of cells is larger than the number of points by ~20 thousand and ParaView does not complain... Maybe because of the inner ionosphere boundary in Cartesian grid? I don't know.

As for the Biot-Savart law integration that requires the cell volumes, I don't think that will be a big issue. Even though everything is cell-centered, for these kinds of operation we are only shifted by half a cell width at the boundary, the uncertainty of which can be neglected consider the effects from numerical schemes, averages and the large number of cells potentially involved.