vpic reader is slow - Githubissues

liangwang0734 commented 6 years ago

There are few issues:

vpic grid does not change with time. However, when parsing a vpic file, viscid tries to iterate over all time steps (and all spatial blocks). When the data contain many time steps, which could be slow. It seems to be one-time thing. However, when working on a running simulation, we might need to reload the file constantly, and it would be very nice to save the extra time on iterating over time step for parsing.
- In vpic.py:VPIC_File:_parse, can we avoid the loop over time but reuse the first frame's grid for all time?
- This is really not a problem with Viscid (Viscid is handling amr-field in the way it should be), but a vpic's data structure problem.
For amr field, the vpyplot.py:plot would make a subplot for each block. The overhead for the new subplot can be substantial if the number of blocks is large, even if the actual data size is not. It would truly ideal if we can assemble the data into one array and make one plot.
- I think viscid has all the pieces needed (particularly the slicing part), and I'm happy to try it if Kris can give some suggestions.
- Another approach is to interpolate onto a uniform grid and plot it. For example, through a seed.RectilinearMeshPoints.

KristoforMaynard commented 6 years ago

Profiling results for loading 300 time steps with 4 patches per step:

    >>> %prun viscid.load_file('./global.vpc')

         5046121 function calls (4792979 primitive calls) in 5.138 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    64238    0.503    0.000    0.830    0.000 bucket.py:71(set_item)
    64844    0.470    0.000    1.194    0.000 tree.py:19(__init__)
315156/63028    0.399    0.000    0.448    0.000 vutil.py:101(subclass_spider)
    63024    0.395    0.000    1.916    0.000 field.py:472(__init__)
        1    0.349    0.349    5.136    5.136 vpic.py:428(_parse)
    63024    0.285    0.000    3.069    0.000 field.py:342(wrap_field)
   128849    0.182    0.000    0.401    0.000 npdatetime.py:729(_check_like)
    63024    0.179    0.000    0.837    0.000 field.py:320(field_type)
    63027    0.178    0.000    0.178    0.000 {method 'items' of 'dict' objects}
   128844    0.160    0.000    0.160    0.000 {built-in method builtins.hasattr}
    63024    0.144    0.000    3.269    0.000 vfile.py:195(_make_field)
   315120    0.136    0.000    0.169    0.000 field.py:1433(istype)
    65147    0.129    0.000    0.704    0.000 tree.py:232(time)
    64238    0.126    0.000    0.275    0.000 bucket.py:53(_make_hashable)
   777426    0.120    0.000    0.120    0.000 {built-in method builtins.isinstance}
   131224    0.119    0.000    0.119    0.000 {method 'format' of 'str' objects}
    63024    0.108    0.000    1.046    0.000 grid.py:262(__setitem__)
    63024    0.101    0.000    0.137    0.000 vpic.py:637(__init__)
    64237    0.080    0.000    0.920    0.000 bucket.py:198(__setitem__)
    65183    0.074    0.000    0.224    0.000 npdatetime.py:782(is_timedelta_like)
    63024    0.072    0.000    1.118    0.000 grid.py:104(add_field)

KristoforMaynard commented 6 years ago

Thoughts on reading vpic fields into a single viscid.field.Field:

A VPIC_DataWrapper could be given a list of file_wrappers, along with information about the destination of each block in the global field. A worthwhile question to ask is: are you more likely to be bitten by the cost of assembling global fields on access, or the one-time cost of creating a large number of field objects. I'm sure it depends on how the domain is decomposed and how the data is being used. Maybe some kind of switch is needed to let the user decide on a case-by-case basis.

liangwang0734 commented 6 years ago

On the file opening, my results on a 1024x1x1 patch vpic run is shown below. Can we make/wrap the fields only when it is used for the first time, or do I have to avoid making those field getters in the reader ( I remember you suggested me to be cautious on the field getters, but I could not find the source)?

         50979356 function calls (48415072 primitive calls) in 104.067 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   651290   14.426    0.000   31.925    0.000 tree.py:19(__init__)
  1302609    8.023    0.000    8.023    0.000 {built-in method builtins.hasattr}
   638982    7.634    0.000    7.634    0.000 {method 'items' of 'dict' objects}
   651266    7.321    0.000   13.985    0.000 bucket.py:71(set_item)
3204096/640000    7.187    0.000    8.183    0.000 vutil.py:101(subclass_spider)
        1    6.364    6.364  104.063  104.063 vpic.py:428(_parse)
   638976    5.835    0.000   47.915    0.000 field.py:472(__init__)
   638976    5.704    0.000   68.635    0.000 field.py:342(wrap_field)
  1327142    3.066    0.000    3.066    0.000 {method 'format' of 'str' objects}
   638976    2.845    0.000   14.423    0.000 field.py:320(field_type)
  7898636    2.747    0.000    2.747    0.000 {built-in method builtins.isinstance}
  1308734    2.712    0.000   12.120    0.000 npdatetime.py:729(_check_like)
   638976    2.480    0.000   71.879    0.000 vfile.py:195(_make_field)
   651302    2.350    0.000   17.075    0.000 tree.py:232(time)
  3194880    2.053    0.000    2.676    0.000 field.py:1433(istype)
   651266    1.990    0.000    5.720    0.000 bucket.py:53(_make_hashable)
   638976    1.818    0.000   17.564    0.000 grid.py:262(__setitem__)
   638976    1.795    0.000    2.476    0.000 vpic.py:637(__init__)
  1953798    1.442    0.000    1.442    0.000 {built-in method builtins.hash}
   638976    1.278    0.000   18.842    0.000 grid.py:104(add_field)
  5764109    1.251    0.000    1.251    0.000 {method 'lower' of 'str' objects}
   651265    1.245    0.000   15.480    0.000 bucket.py:198(__setitem__)
   660518    1.124    0.000    4.111    0.000 npdatetime.py:782(is_timedelta_like)
   660518    1.077    0.000   10.440    0.000 npdatetime.py:769(is_datetime_like)

KristoforMaynard commented 6 years ago

In principle, the grid can defer field creation, but in practice, it'll end up having to override every method that tries to get metadata / coordinates / fields, which sounds rather error prone. Ultimately, something has to keep track of each patch because each patch is unique: it lives in a specific block of a specific binary file and has specific local domain extents. The issue appears to be that Viscid's whole hierarchal structure has initialization overhead that scales linearly with the number of patches.

Maybe the whole tree / bucket infrastructure could be streamlined... the structure as it exists now grew organically as different workflows needed different features. But, I'm suspicious that there may not be much savings to be had with this approach since percall overhead is already of order microseconds.

Another solution might be: one Field per physical quantity per timestep, and have the DataWrapper deal with assembly on access (like I mentioned above). Then initialization overhead should scale more like number of physical quantities ⨉ number of time steps.

I can't imagine init scaling better than number of physical quantities ⨉ number of time steps without significant effort. To do this, one would need custom dataset objects that defer metadata creation, but still know about all the timesteps / physical quantities. Feel free to investigate this, but it sounds like edge-case-city.

KristoforMaynard commented 6 years ago

Ok, maybe I'm being a little pessimistic. In principle, a deferred hierarchy might not be too hard, one just needs to make the following:

A Dataset subclass that has a list of child names, and a callback that knows how to create each child given its name
A TemporalDataset subclass that knows the list of times, and takes a callback that can create children given a time or timestep index
A Grid subclass that knows it coordinates, a list of physical quantities, and a callback that knows how to create Fields

Now that I write this all out, it might be convenient if the base Dataset / Grid classes accepted a callback function for deferred child creation. That could probably simplify a fair amount of existing reader hacks.

viscid-hub / Viscid

vpic reader is slow #20