jpjones76 / SeisIO.jl

Julia language support for geophysical time series data
http://seisio.readthedocs.org
Other
47 stars 21 forks source link

Very slow mseed read time #92

Open tclements opened 2 years ago

tclements commented 2 years ago

I'm on SeisIO v1.2.1 and Julia 1.7.0

I have a miniseed file with over 3,000 channels. Loading the file on a windows machine with 32GB RAM is crashing with out of memory error. Ubuntu machine with 64 GB RAM can read the file but read_data is using a ton of memory and taking much longer than expected:

using SeisIO
@time read_data("mseed","39894823.ms")
194.283458 seconds (395.64 k allocations: 102.749 GiB, 99.57% gc time)
SeisData with 3192 channels (4 shown)
...

in comparison obspy/libmseed does well with this file

import obspy 
 %timeit obspy.read("39894823.ms")
621 ms ± 2.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I'm having trouble attaching the file - it's larger than the file share limit (25MB).

Here's how I got it (requires AWS credentials):

using AWS
using AWSS3
aws = AWSConfig(region="us-west-2")
s3_get_file(aws,"scedc-pds","event_waveforms/2021/2021_357/39894823.ms","39894823.ms")

or can be downloaded with the awscli (does not require AWS credentials):

aws s3 cp --no-sign-request s3://scedc-pds/event_waveforms/2021/2021_357/39894823.ms .