Closed tclements closed 4 years ago
It's definitely worth it to reoptimize these functions if that won't break anything else.
t_expand
and t_collapse
are intended to resolve more difficult timing issues (secondarily, t_expand
is a good fallback for visualization). I've known from the beginning that this could be done with ranges but there were major memory problems with time gaps. I'll have to check whether or not the Julia developers improved that.
I can definitely recode sync
for better performance without gaps, but it's not a good idea to return different Types from the same function. That can lead to instabilities.
A workaround might be for me to only call t_expand
and its inverse on gapped data. I'll try this if it turns out that I can't feasibly change t_expand
itself.
Your idea works.
I finally had time to work on this a few days ago. I'm revising sync!
to avoid t_expand
and t_collapse
-- which I intend to keep in the code for "last resort" procedures, but which shouldn't be needed in such a routine processing operation.
The revised sync
, based on your suggestion, produces identical results in my tests and doesn't change any APIs ... and only scales to ~4K memory, for any trace length. This is tremendous improvement. Many thanks for the great suggestion!
I'm going to include the sync
change in a minor version release (v1.1.0) in a couple of weeks, but the new version will be live on master "soon" (next week, at the latest; hoping for today, but tomorrow is more realistic). The version number is already incremented to 1.1.0 on master, but I want this change in the "release" version.
My hope is that you can report on the relative performance of the new sync
with real data, assuming you aren't too busy. Lately you've had the best data sets for identifying bugs.
Thanks for your patience here.
Sounds great! I can test this on a few TB of data once you commit to master.
Change is now live on master. Please let me know how your testing goes.
Did the new version of sync work for you?
Yes can safely close
Running
sync
is a slight performance bottleneck. For instanceread_data
with 20 Hz data takes less than 1 ms and 6 MBwhile
sync
on the same file (after merging, ungapping) is more time and memory intensiveMuch of this time and memory is spent in
t_expand
andt_collapse
For channels without gaps, it could be more efficient to use a range rather than array in
t_expand
Here's a check that the two methods return the same representation
This change would give different types of output for data with and without gaps, though it looks like
t_collapse
is only used in sync.jl whilet_expand
is in sync.jl and SAC.jl (but theFloat32.(μs*(t_expand(t, fs) .- ts))
line will create anArray
if the input is a range or an array).This could be a simple change with multiple-dispatch handling the discrepancy between the array and range. Thoughts?