Open River-Shi opened 1 month ago
Do you find the performance is lacking? 100 events per second is not a lot for streamz. However, the library you are using and websocket latency are other things, as indeed any CPU time you might be needing for the calculation - I don't know from you code.
Do you find the performance is lacking? 100 events per second is not a lot for streamz. However, the library you are using and websocket latency are other things, as indeed any CPU time you might be needing for the calculation - I don't know from you code.
Thanks for helping. I have another questions:
from streamz import Stream
import asyncio
import time
def increment(x):
time.sleep(0.1)
return x + 1
async def write(x):
await asyncio.sleep(0.2)
print(x)
async def f():
source = Stream(asynchronous=True)
source.map(increment).rate_limit(0.500).sink(write)
for x in range(10):
await source.emit(x)
if __name__ == "__main__":
asyncio.run(f())
from tornado import gen
import time
from streamz import Stream
from tornado.ioloop import IOLoop
def increment(x):
""" A blocking increment function
Simulates a computational function that was not designed to work
asynchronously
"""
time.sleep(0.1)
return x + 1
@gen.coroutine
def write(x):
""" A non-blocking write function
Simulates writing to a database asynchronously
"""
yield gen.sleep(0.2)
print(x)
@gen.coroutine
def f():
source = Stream(asynchronous=True) # tell the stream we're working asynchronously
source.map(increment).rate_limit(0.500).sink(write)
for x in range(10):
yield source.emit(x)
IOLoop().run_sync(f)
what is the different of this two example? I don't think there's any difference in performance between using async and sync here; await emit still blocks the subsequent processes. I tried using asyncio.create(source.emit(x))
, but that just threw an error.
I think there are no difference with:
from streamz import Stream
import asyncio
import time
def increment(x):
time.sleep(0.1)
return x + 1
def write(x):
time.sleep(0.2)
print(x)
def f():
source = Stream()
source.map(increment).rate_limit(0.500).sink(write)
for x in range(10):
source.emit(x)
if __name__ == "__main__":
f()
Correct, there will be no difference to a linear chain of event processing. The point of await
, is that other async things can be happening at the same time ("concurrently"). In this case, there are no other things to process while waiting.
Correct, there will be no difference to a linear chain of event processing. The point of
await
, is that other async things can be happening at the same time ("concurrently"). In this case, there are no other things to process while waiting.
Can you give me some examples of concurrent or non-linear chain of event processing? I'm really struggling to think of any applications. I am trying to emit
concurrently, but it causes error.
from streamz import Stream
import asyncio
import time
def increment(x):
time.sleep(0.1)
return x + 1
async def write(x):
await asyncio.sleep(0.2)
print(x)
async def f():
source = Stream(asynchronous=True)
source.map(increment).rate_limit(0.500).sink(write)
for x in range(10):
asyncio.create_task(source.emit(x)) # raise error
if __name__ == "__main__":
asyncio.run(f())
for x in range(10):
asyncio.create_task(source.emit(x)) # raise error
Does indeed kick off all the coroutines, but they all have first a blocking wait, and then wait again before output.
Consider:
async def write(x):
print(x)
await asyncio.sleep(0.2)
async def f():
source = Stream(asynchronous=True)
source.sink(write)
await asyncio.gather(*[source.emit(x) for x in range(10)])
if __name__ == "__main__":
asyncio.run(f())
Here, all the values print immediately, and the whole takes 0.2s to run.
.emit(msg)
.emit()
blocks on each message, whereas ._emit()
does not (returns a list futures). You could instead buffer a set number of futures before calling await asyncio.gather(*futures)
on them.
Also, asynchronous=True
will launch the ioloop in the current thread, whereas asynchronous=False
will launch it on a separate thread. I've had issues in the past where I could not use asynchronous=True
because the event loop was already running in a .py script.
This is a code for calculating the rolling average of the future ratio/spot ratio - 1 in real-time. Since there can be a large amount of data streaming in from the websocket every second, about 100-200 data points, I’d like to know if you have any suggestions to improve performance?
Here is the implementation of
BinanceWebsocket