Open thingumajig opened 8 months ago
Hi @thingumajig, and thanks for writing.
I can't tell where exactly the issue is coming from (can't reproduce the data), but I guess it comes from creating a copy of data when creating a TimeSeries in this line.
The reason is that we guarantee that each TimeSeries is immutable (and the source is not mutated), to avoid a lot of pit falls down the line.
We're always open for suggestions on how to improve things, as long as we can keep these guarantees.
Hi @dennisbader, thank you for your reply
...I guess it comes from creating a copy of data when creating a TimeSeries in this line. The reason is that we guarantee that each TimeSeries is immutable (and the source is not mutated), to avoid a lot of pit falls down the line.
Looked at the _sort_index
source code and saw a monotonicity check. Looked again at my data, turns out I have at least a time gap.
I'll look at my data again.
But still, maybe gigantic data would be better represented as multiple series?
And kind of neatly create delayed TimeSeries
in the Dask
sense?
Simple steps. There is a large amount of data in netCDF4 format adapted for downloading to Darts:
Output:
Next, too long previous attempts, so I'm resampling:
It takes 17 seconds. That's fine. I get a reduced xarray:![image](https://github.com/unit8co/darts/assets/8703531/7f45fb24-854c-45d1-9e66-dd375da3d988)
Then I just try to create Darts TimeSeries from this array in different ways:
It's already taking over 11 minutes(!). For the full version of my data, I couldn't wait for this process to be over. Is there cloning going on? But just need a new view of data. Or am I doing something wrong?
Okay, on the other hand, I can do the following:
It takes about 5 minutes. I get a representation of da1min in memory. Then again
It takes about 0.4 sec. But for huge data, I think it's a bad way to do it.
Are there any general guidelines for handling data that doesn't fit in memory?
System: