Open JulianWgs opened 3 years ago
You may well be right, and I encourage others to offer an opinion. First, may I nit-pick and suggest this version as a simpler example
import pandas as pd
from streamz import Stream
import numpy as np
names = ["Mike", "Tim", "Anna", "Kim", "Andy"]
def emitter():
n = np.random.randint(1, 5)
return {"name": np.random.choice(names, n), "age": np.random.randint(18, 32, n)}
stream = Stream.from_periodic(emitter, 1)
df1 = pd.DataFrame({"name": ["test"], "age": [40]})
out = stream.map(pd.DataFrame).accumulate(tail(6), start=pd.DataFrame()).to_dataframe(df1)
out.stream.sink(print) # optional, or some other output
stream.start()
Hey @martindurant ,
it seams that there are not many other opinions on this issue.
How should we proceed? For a pull request two things must be discussed before hand:
To my knowledge there is currently no way to show (or plot) the last n elements of a Streamz DataFrame.
I should have noted, that hvplot on a dataframe has a backlog=
argument (see here) which can be displayed in a table of plot, and might have been what you wanted all along.
To my knowledge there is currently no way to show (or plot) the last n elements of a Streamz DataFrame.
I think this is a very useful function for debugging, but also for slower Streamz, for example CFD results from OpenFOAM.
Here is a naive implementation:
Here is an example: (thanks for the feedback @martindurant)
This could either be the bare representation of window or could replace the tail function. I think the name tail suits this function better than the current one. What do you think?
Implementation wise one could be much more efficient by using .loc and round robin replacing values and saving the order somewhere else, but for a first go, I would stick by the naive version.