Closed krischer closed 8 years ago
Hi Lion,
thanks for these tests, which are highly relevant for kernel calculation as well. In HPC environments, the crucial point would be limiting the number of file accesses, which is served by your "unrolling" as well.
I would vote for your complete unrolling solution:
Take all 5 arrays from the two files and restructure the file so the data for all GLL points from a single element can be acquired with just a single read operation. Thus data is stored in a single large 5D array (element id, time_axis, gll_id_i, gll_id_j, direction). This means that duplicate GLL points are no longer discarded - the file is also no longer chunked and thus not compressed. This results in much bigger files but also quite a bit faster.
For the kernel code it could mean replacing 250 file accesses by one (in the best case), which is obviously highly beneficial.
I am still not quite convinced about the "transpose" question. As I understand it, NetCDF/HDF transposes Fortran arrays before writing them anyway.
Maybe one usability hint: The transformation script should probably take the original axisem_output.nc4 files as input, so that we save one level of intermediate files.
Also, the script could be moved to the AxiSEM repository.
I am still not quite convinced about the "transpose" question. As I understand it, NetCDF/HDF transposes Fortran arrays before writing them anyway.
I'm not sure what the Fortran APIs are doing but within HDF5 data is always stored in row major order. So yes - the conversion has to happen somewhere.
The first thing I tried was to use the h5repack
tool and just turning off chunking and compression. Access was incredibly slow because even though the data has been transposed before it was written to the file the fast index of the data did not correspond to the fast index we require for the data extraction. The only reason that worked reasonably well with the existing databases is that the chunking forced the data to be written to disc in a different pattern.
Maybe one usability hint: The transformation script should probably take the original axisem_output.nc4 files as input, so that we save one level of intermediate files.
Also, the script could be moved to the AxiSEM repository.
Both are good points. I'll clean up the script once we decided on what to do.
Okay, another critical issue, I noticed is that the resulting file is not valid NetCDF4 file anymore.
I could not pinpoint the reason yet, but I assume it is due to missing dimensions in the output file. One of the two big differences between NetCDF4 and HDF5 is that the former uses named and shared dimensions (lat, lon, x, y, z, time, depth, etc), while the latter just uses array shapes. https://www.unidata.ucar.edu/blogs/developer/en/entry/dimensions_scales https://www.unidata.ucar.edu/software/netcdf/docs/interoperability_hdf5.html#reading_with_hdf5
Since the kerner will use NetCDF4 for the foreseeable future and performance-wise it is just as critical, I would like to have valid NetCDF4 files as an output of the script. I see two possibilities
First thanks of all for this initiative. My feeling is we might want to move in two directions with different applications in mind:
My suggestion would be to have at maximum two storage patterns, that are optimized for these two directions.
Anyway, both can be done in postprocessing (currently called 'fieldtransform') in python and don't require changes in the Fortran code, so this keeps things relatively simple.
@martinvandriel: agreed, probably, we should move the functionality into fieldtransform.py. This would also solve the NetCDF4 problem.
The IRIS guys could then decide which storage pattern they prefer.
Okay, another critical issue, I noticed is that the resulting file is not valid NetCDF4 file anymore.
That should be easy to fix. I think I just have to copy the dimension scales of the datasets - they also exist in hdf5. I'll make sure the files stay valid netCDF files when I polish the script.
My suggestion would be to have at maximum two storage patterns, that are optimized for these two directions.
We will always support the current storage pattern of data as quite some people already have databases. And then a new fast one. Is that ok? We could probably optimize the current layout to be faster without requiring more space but then we'll have to support three patterns which I would like to avoid.
Here, storage is the crucial part and even some additional CPU overhead with respect to the current master would be acceptable, if that allows additional compression.
You could always use a higher gzip compression level or szip compression. Also make larger chunks which should compress better as compression is per chunk. As long as the logical HDF5 data layout does not change instaseis will continue to work fine.
Okay, another critical issue, I noticed is that the resulting file is not valid NetCDF4 file anymore. That should be easy to fix. I think I just have to copy the dimension scales of the datasets - they also exist in hdf5. I'll make sure the files stay valid netCDF files when I polish the script.
If it's easy to do in h5py, then I don't care which library is used. But it's also needed for all the mesh variables, which are now just copied in a loop. For testing: If
ncdump -hs file.nc4
can read it, it should be fine.
You could always use a higher gzip compression level or szip compression. Also make larger chunks which should compress better as compression is per chunk.
These things don't really help, because floats look very much like random numbers, hence have high entropy and don't compress well (only the zeroes before p-arrival compress well). So the only promising strategy I see here would be to use e.g. wavelets or predictor/corrector methods and convert to integers to make things compressable.
Dear @alexhutko @CTrabant,
since another code would depend on these changes on my side, I'd like to reach a decision soon. What is your opinion on the trade-off between performance and disk space?
Well, based solely on Lion's tests I would have chosen a middle solution (Transposed and non-chunked) to get a performance gain without doubling the file size. Syngine doesn't really fit into either of Martin's target use cases in that we know the IO is heavy but storage size must be considered.
Let us get farther into testing the variations on our storage system and we'll have more information to provide. I expect to have some benchmarks in the next couple of days.
If you are willing to actually test on your storage system use this script: https://github.com/krischer/instaseis/blob/h5py/instaseis/scripts/repack_instaseis_database.py
It needs h5py, click, and numpy (all in conda and pip I think) and maybe Python 3.
Convert to both with:
python /path/to/repack_instaseis_database.py --method=continuous INPUT_DB OUTPUT_DB
and
python /path/to/repack_instaseis_database.py --method=unrolled_merge INPUT_DB OUTPUT_DB
The first one is fairly fast, the second one a bit slower. The second one as of now also only works with databases that have horizontal and vertical components.
And benchmark with the call I posted above or even better benchmark the syngine service if you have a benchmarking utility set up.
Status: we're converting a test database (ak135f_2s, du 1.5 TiB). The unrolled_merge is taking a very long time, "Copying group 'snapshots'" is at 16% after ~21 hours.
Hmm...that should not be. The snapshots
groups should be copied almost immediately. And copying the rest should take around 20-30 hours - at least on our fairly slow network NAS.
It should spend almost all time in "Copying group 'gllpoints_all'".
How big is the file currently? If its less then a couple 100 GB please abort and restart. If it does not reach the 'gllpoints_all' step in a manner of like 60 seconds I might have to look at the script again. In that case: could you somehow give me access to your system? That would vastly simplify debugging. I could send you my public ssh key for example.
Current output for unrolled_merge
is 372 GB. It is already past the gllpoints_all
:
.../repack_instaseis_database.py --method=unrolled_merge ../MODELS/ak135f_2s ./ak135f_2s_unrolled_merge
Copying group 'Surface'...
Copying group 'Mesh'...
Copying group 'gllpoints_all'...
Copying group 'snapshots'...
[######------------------------------] 16% 11:11:26
The continuous
version took 11.3 hours to convert.
I'm reluctant to restart without changing anything as I seriously doubt it will be different (unless this is running in Windows-mode where restarting magically fixes things?). It is definitely using CPU and increasing the output volume. I can start another conversion for testing if needed.
Sorry my comment was mistaken - I've confused things. I just looked at the code and it copies all other groups before it starts to copy the actual data. The order for the other groups is non-deterministic so the output might vary from run to run...
As soon as the progressbar is visible is starts copying actual data values. The number at the end should be an ETA - maybe it is not meant to measure thing running longer than a day and it is missing the day specifier? I'll look into that when I polish the converter script to its final version once we choose which way to go.
I think all is good with your conversion - I'm just surprised it takes so long.
Yes the continuous
version is much faster as it does fairly optimal chunked I/O.
I converted a 1.1 TB file with the unrolled_merge
version and I unfortunately did not time it but it was done over one weekend. I'll try to speed this up if possible if we end up using that version and I think it might be possible - right now it jumps around in the file reading bits and pieces as it sees fit...
Ciao Lion,
I'm not sure what's going on, but maybe it's bug reporting here... I tried repacking to method==continuous on the ak135f_5s database and it barfs in the same place in two separate runs w py2 & py3. I had no problems repacking a smaller database.
Alex
[product1@dpstage REPACK_TEST]$ time /products/axisem/instaseis.beta/anaconda3/bin/python /products/axisem/instaseis.beta/instaseis-repo/instaseis/scripts/repack_instaseis_database.py --method=continuous ak135f_5s ak135f_5s_continuous_chad
--> Processing file 1 of 2: ak135f_5s/PX/Data/ordered_output.nc4
Copying group 'snapshot_times'...
Copying group 'components'...
Copying group 'Seismograms'...
Copying group 'snapshots'...
Copying group 'sim_timesteps'...
Copying group 'Mesh'...
Copying group 'seis_timesteps'...
Copying group 'Surface'...
Copying group 'gllpoints_all'...
Copying 'Snapshots/disp_s' (1 of 5)...
[####################################] 100%
Copying 'Snapshots/disp_z' (2 of 5)...
[####################################] 100%
Copying 'Snapshots/stf_dump' (3 of 5)...
Traceback (most recent call last):
File "/products/axisem/instaseis.beta/instaseis-repo/instaseis/scripts/repack_instaseis_database.py", line 289, in
Hi Alex,
this appears to be an older version of the databases files :-( I pushed a untested fix to this branch. Can you try again? If it does not work I'll try to get my hands on such a file.
The unrolled_merge
variant currently also does not work for vertical only databases. I'll add that functionality if we choose to go down that road.
Hi Lion,
Your fix works for converting the older databases. However, the instaseis.benchmark fails on their untouched and "continuous" versions. It works fine for "unrolled" and "unrolled_merge".
Is there possible fix for this?
thanks, ALex
[product1@dpstage REPACK_TEST]$ /home/product1/anaconda/bin/python -m instaseis.benchmark --pattern UnbufferedFullyRandom --time=30 prem_a_10s /products/SYNGINE_ALEX/INSTASEIS_Jan2016/instaseis-h5py-2/instaseis/init.py:100: UserWarning: Please don't install from a tarball. Use the proper pypi release or install from git.
Instaseis Benchmark Suite
It does not deal with OS level caches! So interpret the results accordingly!
Traceback (most recent call last):
File "/home/product1/anaconda/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/product1/anaconda/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/products/SYNGINE_ALEX/INSTASEIS_Jan2016/instaseis-h5py-2/instaseis/benchmark/main.py", line 391, in
Testing results with ak135_2s
database on IRIS beta system (same storage system as production). OS cache flushed prior to each run to avoid testing the buffer capability.
Test command:
.../python -m instaseis.benchmark --pattern UnbufferedFullyRandom --time=20 <MODEL>
Production Instaseis: Instaseis 0.1.1, Anaconda 2.3.0, Python 3.4.3, numpy 1.9.3, netcdf4 1.1.9
New Instaseis: Instaseis PR 35, Anaconda 2.4.1, Python 3.4.4, numpy 1.10.1, h5py 2.5.0
ak135f_2s, 1.5 TiB, Production Instaseis with original DB:
10.158 sec/seismogram
0.098445 seismograms/sec
Not quite sure how to interpret the result, it certainly does not take 10 seconds to produce a seismogram in Syngine.
ak135f_2s, 1.5 TiB, New Instaseis with original DB:
4.5054 sec/seismogram
0.221956 seismograms/sec
ak135f_2s_continuous, 1.5 TiB, New Instaseis:
0.660914 sec/seismogram
1.51306 seismograms/sec
ak135f_2s_unrolled_merge, 2.3 TiB, New Instaseis:
0.239049 sec/seismogram
4.18324 seismograms/sec
In summary, the new software alone makes a nice difference. The migration to the continuous layout makes another nice performance jump without needing any more storage (for this DB at least). If this can be believed we can get an order of magnitude increase without an increase in volume. The final test is a large increase in volume with increased performance.
I wish we could get support for both the continuous and unrolled_merge, then we could use the later on popular and high priority models and use the former for the rest. If I were forced to choose one I go with continuous.
Before I comment any further on this: Can you run the benchmarks for a longer duration? You just might have been unlucky and ran into some OS or I/O hickup and only got two samples. The --time
parameter is the total approximate runtime in seconds. This particular benchmark tests the worst case with random accesses all over the database and it should give you the time for each 3 component seismogram - the server component would later add a bit more overhead.
Do the benchmarks now work for the continuous
version?
Regarding the continuous
version: I am using a different database than @alexhutko did. Apparently not all of the "old" databases are equal.
I ran the tests multiple times each. But to satisfy, below are the tests for 300 seconds, aka:
.../python -m instaseis.benchmark --pattern UnbufferedFullyRandom --time=300 <MODEL>
Same definitions of Production and New Instaseis as above.
ak135f_2s, 1.5 TiB, Production Instaseis with original DB:
10.2389 sec/seismogram
0.0976667 seismograms/sec
ak135f_2s, 1.5 TiB, New Instaseis with original DB:
5.17566 sec/seismogram
0.193212 seismograms/sec
ak135f_2s_continuous, 1.5 TiB, New Instaseis:
0.647945 sec/seismogram
1.54334 seismograms/sec
ak135f_2s_unrolled_merge, 2.3 TiB, New Instaseis:
0.245925 sec/seismogram
4.06629 seismograms/sec
Virtually the same as before.
As before, OS cache was cleared before each run using echo 3 > /proc/sys/vm/drop_caches
.
Here’s my testing using the same test command and my own version of python and the Feb 3 version of instaseis PR35. I ran these on a slower disk storage system.
Unlike Chad’s test w the TB scale database, I find the continuous version of the databases are larger, e.g. 95 vs 122gb for ak135f_5s. Clean OS cache vs repeated tests only seemed to affect the very small 20s database.
Python 2.7.11 |Anaconda 2.1.0 (64-bit)| (default, Dec 6 2015, 18:08:32) , numpy 1.10.2, h5py 2.5.0
model | size (gb) | seis/s | seis/s | seis/s |
---|---|---|---|---|
prem_a_20s | 0.6 | 4.1 | 14 | 28 |
prem_a_20s_continuous | 1.2 | 45 | 115 | 133 |
prem_a_20s_unrolled | 1.8 | 23 | 68 | 92 |
prem_a_20s_unrolled_merge | 1.8 | 36 | 62 | 76 |
prem_a_10s (5hr duration) | 70 | xx | ||
prem_a_10s_continuous | 80 | xx | ||
prem_a_10s_unrolled | 123 | 3.2 | 3.4 | 3.6 |
prem_a_10s_unrolled_merge | 123 | 3.6 | 4.1 | 3.9 |
ak135f_5s (1hr duration) | 95 | xx | ||
ak135f_5s_continuous | 122 | xx | ||
ak135f_5s_unrolled | 189 | 5.1 | 5.7 | 5.1 |
ak135f_5s_unrolled_merge | 189 | 7.2 | 5.6 | 6.8 |
Here are the models for which I’m able to run a benchmark test (and presumably convert to a new database format):
ak135f_2s 1.5TB generated by AxiSEM version SVN_VERSION at 2015-11-14T19:27:05.000000Z prem_a_20s 0.6GB generated by AxiSEM version 60945ec at 2014-10-23T15:35:34.000000Z prem_a_2s 1.5TB generated by AxiSEM version SVN_VERSION at 2015-12-10T18:42:23.000000Z prem_i_2s 1.4TB generated by AxiSEM version SVN_VERSION at 2015-12-01T10:12:24.000000Z
Models that I can not benchmark, convert to continuous, or even instaseis.open_db using instaseis PR35. These can convert to unrolled & unrolled_merge.
ak135f_1s 1.4TB generated by AxiSEM version SVN_VERSION at 2015-11-19T09:34:56.000000Z ak135f_5s 95GB generated by AxiSEM version v1.2-31-g69d4-dirty at 2015-11-18T12:09:17.000000Z iasp91_2s 1.4TB generated by AxiSEM version NYR1.2 at 2015-03-18T10:03:05.000000Z prem_a_10s 70GB generated by AxiSEM version v1.2-31-g69d4-dirty at 2015-11-12T09:36:02.000000Z prem_a_5s 200GB generated by AxiSEM version v1.2-31-g69d4-dirty at 2015-11-16T22:36:06.000000Z
Is this a significant issue wft moving to one type of database, i.e. will the ‘old’ style databases need to be recomputed?
As things look right now, I think it may be in our best interest to go with lower volume options and accept smaller speed increases than going whole hog. This will allow us to have more models. At this time, slow performance and too many users doesn’t seem to be an issue. I’m happy to do further testing if there’s a patch of some sorts.
I reran my tests with ak135f_2s
with a 1200 second duration with the same results. In short, those are the numbers, if something unwanted is biasing them it is very, very consistent.
Hi guys,
thanks for the tests!
Not quite sure how to interpret the result, it certainly does not take 10 seconds to produce a seismogram in Syngine.
I'm fairly certain that the benchmarks measure the time for a single 3 component seismogram extraction so I would be surprised if it is really wrong. Are you sure that Syngine is much faster? To actually test you have to completely randomize the source and receiver geometry all the time as the internal instaseis buffers are really efficient in some scenarios.
I wrote a small script that tests the performance of syngine. You can find it here: https://gist.github.com/krischer/9d9a4bd078a8e64675f2
Results are measured from Munich but the script accounts for ping. Also the results are almost identical if run from some server in New York. The times are TTFB e.g. the time until the first byte of the response is received (with and without subtracting the ping). They seem quite consistent with your measurements. Do you have other measurement and is there an obvious error in my script?
Model ak135f_2s : min/avg/max/stddev = 7.84/11.04/14.02/ 1.84 sec
--> accounting for ping: min/avg/max/stddev = 7.64/10.85/13.82/ 1.84 sec
Model ak135f_5s : min/avg/max/stddev = 1.40/ 1.92/ 2.97/ 0.45 sec
--> accounting for ping: min/avg/max/stddev = 1.20/ 1.72/ 2.77/ 0.45 sec
Model iasp91_2s : min/avg/max/stddev = 2.80/ 3.23/ 3.88/ 0.30 sec
--> accounting for ping: min/avg/max/stddev = 2.61/ 3.03/ 3.68/ 0.30 sec
Model prem_a_10s : min/avg/max/stddev = 1.71/ 1.89/ 2.11/ 0.12 sec
--> accounting for ping: min/avg/max/stddev = 1.51/ 1.69/ 1.91/ 0.12 sec
Model prem_a_20s : min/avg/max/stddev = 0.76/ 0.93/ 1.09/ 0.10 sec
--> accounting for ping: min/avg/max/stddev = 0.56/ 0.73/ 0.90/ 0.10 sec
Model prem_a_2s : min/avg/max/stddev = 10.55/12.28/13.39/ 0.91 sec
--> accounting for ping: min/avg/max/stddev = 10.35/12.08/13.20/ 0.91 sec
Model prem_a_5s : min/avg/max/stddev = 1.56/ 2.30/ 4.11/ 0.79 sec
--> accounting for ping: min/avg/max/stddev = 1.36/ 2.11/ 3.91/ 0.79 sec
Model prem_i_2s : min/avg/max/stddev = 8.10/10.73/14.55/ 1.94 sec
--> accounting for ping: min/avg/max/stddev = 7.90/10.54/14.35/ 1.94 sec
Unlike Chad’s test w the TB scale database, I find the continuous version of the databases are larger, e.g. 95 vs 122gb for ak135f_5s.
That should also happen for the larger databases. The difference in size between the current and the continuous
layout is that the later has no compression. I suspect that Chad's database is just not compressed in the first case.
i.e. will the ‘old’ style databases need to be recomputed?
No...Instaseis will definitely keep supporting the 'old' style databases.
As things look right now, I think it may be in our best interest to go with lower volume options and accept smaller speed increases than going whole hog. This will allow us to have more models. At this time, slow performance and too many users doesn’t seem to be an issue. I’m happy to do further testing if there’s a patch of some sorts. I wish we could get support for both the continuous and unrolled_merge, then we could use the later on popular and high priority models and use the former for the rest. If I were forced to choose one I go with continuous.
As written above I think you just got an oddball of a database. The relative sizes should be about old_style
(100% size), continuous
(125-140% size), and unrolled_merged
(~200% size).
I personally would like to keep the old_style
(we have to) and the unrolled_merge
variant. We thus would have a small and slow version and a large and fast one. Also for me there is not a big practical difference between 1TB and 2TB but I understand that your requirements are different.
@sstaehler also needs the unrolled_merge
variant.
I hope to have some time next week and will try to find a sane way to incorporate multiple database layouts in a maintainable way. If possible I guess we can support both new variants if really necessary but I would have to rely a bit on your testing. If not we will have to discuss again.
@krischer, I ran some tests and and the results are about the same to your performance script, just a tad faster across the board, as expected for an internal user. I think the confusion arises from us saying syngine is ~2s/trace for the full scale models. That's only one component and under our typical usage case where either the station or the source is repeated and thus cached/faster.
Here is a speed test using FFMs (minus the overhead of source processing) and if you agree that these make sense, it's a strong case for supporting the continuous version of the databases. I'm guessing this is because for FFM traces, instaseis is extracting from very similar locations of the database? Perhaps the totally randomized tests we've been running aren't the best route for thinking about which database format is superior since most uses (at least for IRIS users) will be single source, multiple receivers where it looks like continuous > unrolled_merge? I suppose the conversion to continuous being faster than unrolled_merge and the smaller volume is just icing on the cake...
Thoughts?
http://earthquake.usgs.gov/earthquakes/eventpage/usb000jyiv#scientific_finite-fault (about 200 subaults)
model | sec/trace (fresh run) | sec/trace (repeated) |
---|---|---|
ak135f_2s | 165 | 145 |
ak135f_2s_continuous | 10 | 8 |
ak135f_2s_unrolled_merge | 40 | 29 |
I wasn't sure if the totally random or FFM testing were the best way to determine what would suit IRIS's needs, so I ran some tests that are event-based where I used 15 random sources where each had 50 random receivers. I think this is more typical use cases than purely random src-rec pairs...
model | time per 3 (ZNE) traces (s) | with resampling to 0.1Hz (s) |
---|---|---|
ak135f_2s | 10 | 12 |
ak135f_2s_continuous | 1.1 | 1.7 |
ak135f_2s_unrolled_merge | 0.4 | 0.9 |
I think from syngine's perspective, if both continuous and unrolled_merge were supported in instaseis, we'd go with the continuous option for now since the lower volume will allow us to hold more models.
Whatever route you have chosen, do you have a timeline estimate for when you'll feel comfortable releasing an update to instaseis?
The increased performances are quite impressive.
I was planning on doing this tomorrow and depending on how it goes the day after. The current plan is find some abstraction layer to support different database layouts. If that works (which I think it should) we can easily support a large number of different database layouts. I personally think that having fewer but faster databases has more merits than a larger number of slower ones but if we support a number of different layouts you can mix and match as you please and adapt to possibly changing demands :-)
I wasn't sure if the totally random or FFM testing were the best way to determine what would suit IRIS's needs, so I ran some tests that are event-based where I used 15 random sources where each had 50 random receivers. I think this is more typical use cases than purely random src-rec pairs...
I agree that a random test is a much better proxy. I'm also not sure why the FFM test results in a faster continuous
layout. The unrolled_merged
one should always be strictly faster. I suspect that maybe the buffers don't work with the unrolled_merged
one - i'll check tomorrow.
In general I think the best test for performance are completely random accesses. As soon as a couple users are accessing the service in parallel that is the only metric that really counts.
BTW - this might come in handy for performance testing: the instaseis benchmarking tool is quite comprehensive and can simulate a number of different loads/usage patterns. That should probably be documented a bit better.
python -m instaseis.benchmark --pattern FiniteSourceEmulation /path/to/db
python -m instaseis.benchmark --pattern BufferedHalfDegreeLatLngDepthScatter /path/to/db
python -m instaseis.benchmark --pattern UnbufferedFullyRandom /path/to/db
python -m instaseis.benchmark --pattern BufferedFixedSrcRecRoDOffSeismogramGeneration /path/to/db
Essentially every class here is a benchmark: https://github.com/krischer/instaseis/blob/master/instaseis/benchmark/__main__.py
You could easily add your own.
Hi guys,
I'm very sorry for the delay - other more pressing issues demanded my attention. I'll do this ASAP (might take till the week after next week because we are organizing a workshop next week).
If somebody want to give this a shot in the meanwhile:
We currently have a Mesh
class that does not do a lot more than offering shortcuts to things in the netcdf files: https://github.com/krischer/instaseis/blob/master/instaseis/mesh.py
The core instaseis routines still directly access the netcdf files by using pointers/references on the mesh object: https://github.com/krischer/instaseis/blob/master/instaseis/instaseis_db.py
The idea is now to add methods to the Mesh class so that the core functions just call these. Thus instead of for example directly accessing the corner points of the elements they call a method get_corner_points()
. This now allows us to create various specializations of the Mesh
class; one for each database format. We could even have non netcdf databases but I guess there is no practical use for that as of now. I'm confident this will work and it grants us support for any number of current and future database layouts with reasonable developmental efforts.
Continued in #42.
Hi all,
@sstaehler @martinvandriel @tnissen @alexhutko @CTrabant - pinging you all as its somehow relevant to all of us.
I spent some time last week and today trying to make Instaseis (and by extension) Syngine faster. So far I only looked at I/O performance as that is the dominant part and everything else can be made faster by just throwing more CPUs at it.
To extract a single 3 component seismogram we have to access data from 5 arrays across two netCDF files. At each array we have to get data for all 25 GLL points. Worst case this results in 125 read operations for each 3C seismogram.
Currently this data is stored in chunked netCDF files (more about chunks) at one chunk per GLL point time series (that's what makes it fairly performant even though data is accessed with a loop over the slow index). Nonetheless 125 chunks have to be found (HDF traverses a btree for that) and read.
This PR does a couple of things in an attempt to speed things up.
netcdf4
withh5py
, a Python package to read HDF5 files. Newer netCDF versions are based on HDF5 thus we essentially skip one layer of software which is always nice.The most promising database restructurings are:
Two things that might still be worthwhile to try are (I'll be pretty busy the next two weeks - please feel free to try it):
What trade-off between file size and extraction speed should we make? Also for the syngine use case. Is doubling the file size ok? Instaseis will always support the current database structure but I will also add support for one more layout. Which one is best?
The following measurements have to be taken with a large grain of salt. It is really hard to make these measurements as multiple layers of caches are active at any point in time (OS caches/HDF5 caches/Instaseis caches/...). If you try this, please clear the operating system disc buffers before making the measurements. Also use a database with a size of at least a couple times your system's memory - the buffers are surprisingly effective and small databases tint the result as we essentially only care about performance for the large databases.
Furthermore they are highly dependent on the system and I/O architecture and load.
I made the measurements with
This branch also contains a tool to convert databases from the current layout to another layout:
This can take a very long time!
Benchmarks
2s database over a pretty slow network NAS
Current DB Structure (1.1 TB) and current Instaseis Master:
Current DB Strucuture (1.1 TB) and h5py Branch:
Transposed and non-chunked DB (1.4 TB) and h5py Branch:
Unrolled and unchunked DB (2.2 TB) and h5py Branch:
5s mars database on an external USB hard disc
Current DB Structure (37 GB) and current Instaseis Master:
Current DB Strucuture (37 GB) and h5py Branch:
Transposed and non-chunked DB (54 GB) and h5py Branch:
Unrolled and unchunked DB (83 GB) and h5py Branch: