jpjones76 / SeisIO.jl

Julia language support for geophysical time series data
http://seisio.readthedocs.org
Other
47 stars 21 forks source link

Unusual performance issue with sample file #72

Closed jpjones76 closed 3 years ago

jpjones76 commented 3 years ago

Sample file submitted by @tclements has bizarre, impossibly high memory consumption in read_data (memory overhead >40000%, or ~3 orders of magnitude greater than any other known file).

julia> fname = "/data/Downloads/CIGATR_HHZ___2017085.ms"
"/data/Downloads/CIGATR_HHZ___2017085.ms"

julia> @benchmark read_data(fname)
BenchmarkTools.Trial: 
  memory estimate:  80.66 GiB
  allocs estimate:  537728
  --------------
  minimum time:     30.064 s (5.49% GC)
  median time:      30.064 s (5.49% GC)
  mean time:        30.064 s (5.49% GC)
  maximum time:     30.064 s (5.49% GC)
  --------------
  samples:          1
  evals/sample:     1

Originally posted by @jpjones76 in https://github.com/jpjones76/SeisIO.jl/issues/62#issuecomment-719848007

jpjones76 commented 3 years ago

I've pushed one commit that drops the memory use by a factor of 5 and read time by 40%, but that's only slightly less terrible. I can't improve it further without rewriting the time library. The remaining slowdown -- which is extremely significant -- is because :t is an Array{Int64, 2}. Accounting for a gap needs to call vcat, which causes the slowdown. I hadn't imagined files with 60,000 gaps could exist.

tclements commented 3 years ago

This is great - I think files with 60,000 are very much an edge case.