Open dhomeier opened 4 years ago
Just quoting my previous comment on the second kwarg
that is now missing in aggregate_downsample
, binsize
as a constant number of data points rather than a bin length in time. I think it would actually not be too hard to implement this by directly constructing a BinnedTimeSeries
with time_bin_start = self.time[::bins]
, but maybe the more important question is whether this binning method is really a better option for signal processing. It could obviously provide more homogeneous bin errors, but would that offset the irregular time sampling?
Thank you for looking into this @dhomeier! Carefully considering the behavior of lc.bin()
has been on my list of things to do before we can release v2.0.
It's not good that I broke the old behavior of bin (i.e. based on binsize) without providing at least the ability to still pass a binsize parameter. I remember having a quick go at adding the binsize parameter but I struggled (I can't remember why), so I left this for later.
Now would be a good time to fix this! Are you interested?
I've been looking a bit further, and indeed it seems this is not done so straightforwardly by creating a new BinnedTimeSeries
. I guess basically the code in aggregate_downsample
following determining of the bin sizes and creation of the new BinnedTimeSeries
would have to be replicated. Not a terribly big deal, but I wonder if it would not be a cleaner solution to directly modify aggregate_downsample
upstream, but that, being new functionality, would be in 4.2 the earliest; so perhaps a temporary solution within LightCurve
is still warranted.
The old bin
function only accepted one of either bins
or binsize
, so we should follow that rule and raise a ValueError
, too, for any conflicting combinations of bins
, n_bins
, binsize
and time_bin_size
.
I'm starting to think that we should revert to Lightkurve's old bin
function (i.e. the one that only accepts bins
or binsize
), so that we can go ahead with Lightkurve v2.0 and spend the remaining time on fixing docs/tutorials instead of re-inventing the bin
method.
In this scenario, users can still apply AstroPy's aggregate_downsample
method to bin a LightCurve
object in time, we just wouldn't make this the default behavior of bin
.
Thoughts?
It did not seem to me the old function would work on a TimeSeries
object out of the box, but I may be wrong. I'll have to contemplate the function of aggregate_downsample
a bit more, but I still think it can be done without too much effort.
Problem description
Fleshing out the details on one of the API changes related to the migration to
TimeSeries
in #744:astropy.timeseries.aggregate_downsample
. This significantly alters the behavior of binning, because bins are now defined in time instead of by number of data points.(largely copied and pasted from https://github.com/KeplerGO/lightkurve/pull/744#issuecomment-685993661): Also related to this migration it now has an
n_bins
instead of thebins
argument, which I think can easily lead to further confusion (at least did in my case), sinceaggregate_downsample
always constructs a binned timeseries of lengthn_bins * time_bin_size
even with the latter just set to the default (0.5 d). I.e. the length of the returned time series may have little relation to that of the input.As in the case below,
lc.bin()
by default downsamples into the appropriate number of 12 h-bins (187), whereas selecting a smaller or larger value forn_bins
without other modifications just creates fewer or more bins of the same size, ether covering only part of the full time series or returning so many bins without any data:Example
Expected behaviour
The more user-friendly interface to me would offer a functionality equivalent to the old
lc.bin(bins=N)
or perhaps keepingbins
as an additional argument right away, so that the default bin size would be adapted astime_bin_size = (self.time[-1] - self.time[0]) / bins
Then of course one would need to define a priority scheme and appropriate warnings if 2 or more of
time_bin_size
,bins
,n_bins
are set...To be continued.