wlanslovenija / datastream

Datastream API provides a powerful and unified Python API for time-series data.
http://datastream.readthedocs.org/
Other
18 stars 9 forks source link

Make sure all operations can be run concurrently multiple times #23

Open mitar opened 10 years ago

mitar commented 10 years ago

Make sure all operations can be run concurrently multiple times. There are two main issues.

Assuring that concurrent runs of downsampling do the expected thing (not overriding or duplicating work). Probably we could lock streams as they get started being downsampled and other runs skip them. We should make sure that they do not get locked indefinitely. Same for backprocessing of dependent streams.

Assuring that datapoints can be appended concurrently. Mostly this is already so and even for processing of dependent streams this is so. The only known issue is with derive operator which expects reset stream to be processed before data stream, so that it can know if reset happened or not. Maybe we should just document this and require user to assure that? Or should we make it work no matter the order? The issue with the latter path would be that it seems we would have to store not just datapoints when reset happened, but also when it did not.

kostko commented 10 years ago

We have now implemented the following:

Handling concurrent backprocessing and derived streams is still pending.

mitar commented 10 years ago

Just to add to the comment above. So currently it means that you can downsample only until 10s before the last datapoint. (10s is used for above mentioned safety margin.)