Closed dequiroga closed 1 month ago
@dequiroga This is great, we've been dealing with this problem for a while now, but hadn't used much big data. I like your method of using scipy.stats, that probably is much faster than np.median.
Could you draft a pull request?
Will do, I have a branch with the fix but I cant seem to push without permission! I am happy to push and draft a PR if you give me access.
@kujaku11 can you please add permission for @dequiroga to PR into mth5?
Thanks @kujaku11, see the draft PR #246
The use of ChannelTS.compute_sample_rate is significantly slow. (Significant when used repeatedly in large datasets)
Once in a while, if the sample rate is not defined, MTH5 computes it from the time array, and it uses the median of the time differences. When using this functionality repeatedly (for large datasets) this results in a significant time inefficiency.
I have been having the same results (but substantially faster) using the mode. e.g.:
Take the mode?
Another idea would be doing some sort of weighted average, but this does not seem to be as robust... e.g.:
Weighted average of the unique dt occurrences (weighted by counts)?