henry2004y / Vlasiator.jl

Data processor for Vlasiator
https://henry2004y.github.io/Vlasiator.jl/stable/
MIT License
6 stars 4 forks source link

VTK support #10

Closed henry2004y closed 3 years ago

henry2004y commented 3 years ago

It is currently possible to convert VLSV format to VTK format which can be used directly in ParaView and VisIt. This is mostly useful for visualizing 3D AMR data.

I know very few about the AMR support in ParaView. This blog post briefly described what they have done for Enzo and Flash, which are both actually pretty similar to what DCCRG does under the hood. This discourse post described a user's experience in implementing his own AMR reader in Python, but his original data is save in hdf5 format already. What he did is taking advantage of the vtk package bundled with paraview:

from paraview import vtk

data = vtk.vtkOverlappingAMR()
......

where vtkOverlappingAMR is a class in VTK.

There is another relatively old link describing how to render AMR data, but I found it hard to follow. However, this python interface for generating an AMR Gaussian pulse might be very useful. In pvpython, it is actualy very easy to create a sample AMR dataset:

from paraview.simple import *
pulse = AMRGaussianPulseSource(Dimension=3, RootSpacing=0.1, RefinementRatio=2)

The sample scripts provided by the user in the above link should in principle work. But I feel like it would be better do it purely in python.

# The structure of DCCRG makes it possible to know the cell neighbors given the level of refinement and the dimension on the coarsest level.

It might be possible to directly write AMR data in VTK format. See this issue.

When doing real 3D analysis, it would be really useful to have a ParaView reader then.

henry2004y commented 3 years ago

As a first step, I can test if I can use WriteVTK.jl to reproduce the AMR GaussianPulse Source.

henry2004y commented 3 years ago

As a second step, we need a function that can extract cell data as "blocks" on each refinement level:

First set of functions implemented in 448b3bb6c548f8e0a

henry2004y commented 3 years ago

Based on Mathieu's answer, a better file structure would be one image file per refinement level, and then use vtkGhostType to indicate whether each cell has been refined or not. This indeed sounds better than 4/8 cells per file!

However, I am having some difficulties using vtkGhostType for the refined mesh. See the later replies in the linked post. As a temporary workaround, I can provide useful data for all AMR levels on a uniform mesh by doing averaging and filling. This allows me to at least have useful image file on each separate AMR level.

I believe this is somehow a bug in VTK. The visibility behavior is changing from v5.9.0 to v5.9.1 nightly. Still waiting for Mathieu's answer.

I found a persion posting a Python script for generating NonOverlappingAMR data on Discourse

henry2004y commented 3 years ago

VTK OverlappingAMR with image file can support true 2D, but it requires several parameters to be consistent:

Since DCCRG works in 3D, and only produces fake 2D, it makes sense to always stick to 3D VTK formats.

henry2004y commented 3 years ago

First implementation is here f30bdc881a8ce56, but the performance is not so good. For converting a 36M 3D file with 1 level of refinement,

julia> @time Vlasiator.write_vtk(meta)
 14.460983 seconds (134.01 M allocations: 44.671 GiB, 7.07% gc time)

This can certainly be improved. Currently most of the time is spent in reading individual parent cells:

for (i, cid) = enumerate(get1stcell(maxamr, ncells)+1:get1stcell(maxamr+1, ncells))
   cidparent = getparent(meta, cid)
   if cidparent in cellid
      celldata[iv][end][:,index_[i]] = readvariable(meta, var, cidparent)
   end
end

Changing this to

cids = [get1stcell(maxamr, ncells)+1, get1stcell(maxamr+1, ncells)]
cidparents = Vector{Int64}(undef, cids[2]-cids[1]+1)
cidmask = similar(cidparents)
nc = 0
for (i, cid) = enumerate(cids[1]:cids[2])
   if cid ∉ cellid
      nc += 1
      cidmask[nc] = i
      cidparents[nc] = getparent(meta, cid)
   end
end

celldata[iv][end][:,index_[cidmask[1:nc]]] = readvariable(meta, var, cidparents[1:nc])

makes it 5x faster and 7x more efficient in memory. With this change the 36M file conversion now is

julia> @time Vlasiator.write_vtk(meta)
  5.231831 seconds (18.53 M allocations: 5.272 GiB, 2.51% gc time)

Another issue is that the averaging over values of integer values has round-off errors on lower resolution levels, e.g. boundary_type. I only save the correct values on the finest refinement level; values of Int in other level's cells that are actually refined are simply 0s, which are just used to fill in some values.

henry2004y commented 3 years ago

Later I realized that reading cells together is more efficient in memory consumption and that should also be done in other coarse levels. So for the same file conversion as before, the version mentioned above gives

julia> @btime write_vtk(meta)
  5.551 s (18532171 allocations: 5.27 GiB)

while the new one with coarse level also with bundled reading gives

julia> @btime write_vtk(meta)
  4.085 s (3562007 allocations: 168.68 MiB)

The current version is ~4x faster and 267x more efficient in memory usage compared with the initial implementation.

henry2004y commented 3 years ago

Until now velocity distributions saved are not ported to the generated VTK files. I don't think it is really needed now, but we can think about how to do that in the future.

henry2004y commented 3 years ago

I test the conversion on a large 3D 1 level AMR VLSV file, and it becomes very expensive:

julia> @time write_vtk(meta)
422.187386 seconds (59.59 M allocations: 2.583 GiB, 0.09% gc time, 0.22% compilation time)

By moving the parent cell finding out of the variable loop, this can be improved:

julia> @time write_vtk(meta)
196.123699 seconds (58.79 M allocations: 2.393 GiB, 0.17% gc time, 0.28% compilation time)

Looking into the details, I find that filling the fine cells from their parents' values here

celldata[iv][end][:, seq_[cidmask]] = readvariable(meta, var, cidparents)

by itself takes 120s, which is about 60% of the total time.

There is really not much I can do to further speed this up. I tried to add multithreading to the variable loop, but it turned out that since there is only one fid for each file, the seek function will cause race conditions between reading different parts of the file.

One thing to note is that cidparents contains duplicate cell IDs, since 8 cells on the finest AMR level would have the same parent cell. This means that it may be beneficial to first read all the parent cells' values first, and then fill in the fine cells' values.

henry2004y commented 3 years ago

I later realized that that the previous implementation from the finest to the coarest level cannot handle the case where there is no direct parent for finest cells on the 2nd finest level! So this time I changed the order of loops: now the level loop is from coarse to fine. The downside of this way is that the cells that are refined won't have meaningful averaged values; from AMR perspective, this may be just fine. The new version works for more than 1 levels of refinement.

julia> @time write_vtk(meta)
  3.885190 seconds (18.45 M allocations: 562.523 MiB, 4.09% gc time)

For the large 3D 1 level AMR VLSV file:

julia> @time write_vtk(meta)
 49.502307 seconds (148.39 M allocations: 4.404 GiB, 0.99% gc time)

which is 4 times faster than before.

henry2004y commented 3 years ago

Now what about time series files conversion? ParaView can understand that files with pattern name_00xx.vti belongs to the same time series with stamps showing by the numbers. I can add another method for this purpose.

henry2004y commented 3 years ago

Another micro optimization I find is that try to avoid using counters in the innermost loops. This causes a surprising 33% improvement in reading 2 variables from the 3D 1 level AMR data.

henry2004y commented 3 years ago

Adding @floop:

julia> @time write_vtk(meta) # test file as before
  3.650974 seconds (16.24 M allocations: 528.821 MiB, 1.41% gc time)

With 2 threads:

julia> @time write_vtk(meta)
  2.248806 seconds (16.24 M allocations: 528.837 MiB, 1.74% gc time)

With 4 threads:

julia> @time write_vtk(meta)
  1.629885 seconds (16.24 M allocations: 528.871 MiB, 2.59% gc time)

50MB data:

julia> @time write_vtk(meta) # 1 lvl AMR 50 MB
 49.470840 seconds (130.16 M allocations: 4.132 GiB, 0.62% gc time)

With 2 threads:

julia> @time write_vtk(meta)
 40.754638 seconds (130.16 M allocations: 4.132 GiB, 1.12% gc time)

However, for some unknown reasons I cannot reproduce these results! What I can tell from the benchmark reports and large file timing is that FLoops.jl certainly helps with VDF and fsgrid reading, but not AMR file filling.

henry2004y commented 3 years ago

By reducing the duplicate calls to findfirst in readvariable, the performance can be further improved by 3.5x! 50MB 1lvl AMR data, v0.6.2:

julia> @time write_vtk(filename)
 12.544818 seconds (36.07 M allocations: 1.440 GiB, 0.91% gc time)

In fact, this can be improved further, by reusing arrays of the same type and size for all the variables. However, as of now I am unable to come up with an elegant code for this idea, and it is no longer the bottleneck. The bottleneck is still the findfirst list comprehension. The problem is, can I construct a hash table efficiently such that given any cellid, I can immediately find the its index/order in the VLSV file?

henry2004y commented 3 years ago

Some profilings above is not trustworthy! I cannot reproduce the results! Or am I operating on a different file? v0.7.0:

git checkout v0.7.0
julia> @benchmark write_vtk(filename)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 31.622 s (1.46% GC) to evaluate,
 with a memory estimate of 4.35 GiB, over 145913677 allocations.

v0.7.1:

git checkout v0.7.1
julia> @time write_vtk(filename)
 27.981823 seconds (145.91 M allocations: 4.285 GiB, 1.96% gc time
git checkout v0.6.2
julia> @time write_vtk(filename)
 30.849516 seconds (145.91 M allocations: 4.350 GiB, 1.03% gc time)
henry2004y commented 3 years ago

Further improvements! v0.7.5:

julia> @time write_vtk("bulk.0000003.vlsv") # 1st time execution
 28.804633 seconds (155.89 M allocations: 4.797 GiB, 2.23% gc time, 0.14% compilation time)

v0.7.7

julia> @time write_vtk("bulk.0000003.vlsv") # 1st time execution
 19.531925 seconds (77.73 M allocations: 3.039 GiB, 2.62% gc time, 0.20% compilation time)

v0.8.1 (compile time up due to the switch to EzXML)

julia> @time write_vtk(filename) # 1st time execution
 18.405493 seconds (78.39 M allocations: 3.075 GiB, 3.00% gc time, 2.81% compilation time)

v0.8.4 (bug fix on Float64 --> Float32 type conversion since v0.7.x, mmap instead of multiple read!)

julia> @benchmark write_vtk(filename)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 14.580 s (2.62% GC) to evaluate,
 with a memory estimate of 2.33 GiB, over 67060114 allocations.

v0.8.5 (function barriers on celldata filling)

julia> @time write_vtk(filename) # 1st time execution
 15.531484 seconds (55.72 M allocations: 2.692 GiB, 3.02% gc time, 2.84% compilation time)

julia> @benchmark write_vtk(filename)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 10.266 s (3.25% GC) to evaluate,
 with a memory estimate of 1.97 GiB, over 42733579 allocations.

v0.8.6 (replace findfirst with indexin, more memory usage, but faster for larger arrays)

julia> @time write_vtk(filename) # 1st time execution
 12.583741 seconds (73.09 M allocations: 3.304 GiB, 4.06% gc time, 3.92% compilation time)

julia> @benchmark write_vtk(filename)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 7.074 s (4.95% GC) to evaluate,
 with a memory estimate of 2.59 GiB, over 60255050 allocations.

v0.8.25 (doing function barriers properly!)

julia> @benchmark write_vtk(file)
BenchmarkTools.Trial: 6 samples with 1 evaluation.
 Range (min … max):  838.351 ms … 958.141 ms  ┊ GC (min … max): 0.27% … 13.05%
 Time  (median):     872.960 ms               ┊ GC (median):    2.04%
 Time  (mean ± σ):   881.696 ms ±  43.051 ms  ┊ GC (mean ± σ):  4.42% ±  4.74%

  █    █       █       █        █                             █  
  █▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  838 ms           Histogram: frequency by time          958 ms <

 Memory estimate: 148.01 MiB, allocs estimate: 3579.

v0.9.20

julia> @benchmark write_vtk(file)
BenchmarkTools.Trial: 5 samples with 1 evaluation.
 Range (min … max):  1.096 s …   1.222 s  ┊ GC (min … max): 1.24% … 10.51%
 Time  (median):     1.131 s              ┊ GC (median):    1.20%
 Time  (mean ± σ):   1.143 s ± 46.907 ms  ┊ GC (mean ± σ):  3.96% ±  4.09%

  █             ██   █                                    █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁██▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.1 s          Histogram: frequency by time        1.22 s <

 Memory estimate: 151.54 MiB, allocs estimate: 3575.

v0.9.21 after refactoring fillmesh, for large files it has ~2x speedup, for small files it's about the same

julia> @benchmark write_vtk(file)
BenchmarkTools.Trial: 5 samples with 1 evaluation.
 Range (min … max):  1.043 s …   1.124 s  ┊ GC (min … max): 0.08% … 7.16%
 Time  (median):     1.097 s              ┊ GC (median):    5.12%
 Time  (mean ± σ):   1.083 s ± 36.244 ms  ┊ GC (mean ± σ):  3.82% ± 2.97%

  █  █                                  █    █            █  
  █▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.04 s         Histogram: frequency by time        1.12 s <

 Memory estimate: 133.15 MiB, allocs estimate: 3534.

v0.9.22: improving rOffsetsRaw calculation

julia> @benchmark write_vtk(file)
BenchmarkTools.Trial: 5 samples with 1 evaluation.
 Range (min … max):  1.046 s …  1.064 s  ┊ GC (min … max): 0.12% … 0.11%
 Time  (median):     1.047 s             ┊ GC (median):    0.12%
 Time  (mean ± σ):   1.052 s ± 7.656 ms  ┊ GC (mean ± σ):  0.15% ± 0.11%

  ██ █                    █                              █  
  ██▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.05 s        Histogram: frequency by time        1.06 s <

 Memory estimate: 125.95 MiB, allocs estimate: 3469.