python-streamz / streamz

Real-time stream processing for python
https://streamz.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.24k stars 148 forks source link

Live source for experimentation and demonstration #78

Open mrocklin opened 6 years ago

mrocklin commented 6 years ago

It would be useful when explaining streamz to have a live source of data. Are there any good web APIs that we can query from somewhat rapidly without making anyone angry at us? Perhaps a time series of changing data like stock data? If anyone has time to search around the internet that would be helpful. If anyone finds something nice with requests or whatnot I'd be more than happy to tornado-ify it.

jrmlhermitte commented 6 years ago

It's not exactly streaming, but NOAA has some anonymous ftp servers where some may contain fairly recent data. Could be interesting: https://www.ncdc.noaa.gov/data-access For example: ftp://eclipse.ncdc.noaa.gov/pub/

I explored around and read some of this data in the past for fun. I could take a look later. Not sure what statistic we could compute that is relatively straightforward. Maybe the variance in temperature versus time?

Based on our previous conversations, @nbren12 and @jhamman may also have some idea since they're pulling live data. Maybe some of it is public?

nbren12 commented 6 years ago

Stock data sounds like a good idea, although that might scare off some academic scientists, who tend to be a little snooty when it comes to finance.

I am not sure where to grab the live feeds from satellite obs, since most people are usually work with historical data on a global grid with around 3 hourly or daily temporal sampling. It looks like USGS has a super cool live feed from the LandsSat satellites, but it seems locked behind a web ui

Is there some requirement that this be real data? Any sort of simulation you can run on a computer would be a good demonstration IMO. For example, when we presented geostreams, Joe and I used Conway's game of life as a motivating example, and I think it was pretty easy for people to analogize to more interesting live datasets.

jrmlhermitte commented 6 years ago

I looked at weatherunderground.com, but they only allow 500 queries per day (for their free version). It's as simple as sending a query to an http page like http://api.wunderground.com/api/API_KEY/hourly10day/q/NY/New_York_City.json where API_KEY is a key you register for. (link here )

I was thinking, another option is to potential stream something directly from youtube, maybe a full video or just sound (they can probably handle the load). Another thing could be some astronomy sites. I am looking into tidbits here and there when I have time.

jrmlhermitte commented 6 years ago

PS @nbren12 that live feed looks awesome. I went ahead and emailed usgs just in case, explaining we want a live stream for educational/demonstration purposes. We'll see what they answer. thanks!

nbren12 commented 6 years ago

That'd be pretty cool if we can access that feed!

jrmlhermitte commented 6 years ago

I received a response from USGS:

Julien, We do not support access to live stream data. We apologize for the inconvenience.

Too bad. I guess the search continues.

As a heads up, I am also looking into astronomy data (I'm a big amateur astronomy nerd). I've contacted AAVSO.org to see if they offer any service. They offer downloading csv files already through a form. Maybe they might have some API or service. I'm also looking into radio astronomy data.

jrmlhermitte commented 6 years ago

How about device streaming? I've been playing with software defined radio. Acquiring data is a matter of buying a $20 usb dongle and installing the relevant libraries, really easy to do. This would be useful for a live demonstration but likely not very portable: https://www.rtl-sdr.com/about-rtl-sdr/

An interesting stream could be scanning a range of frequencies and searching for peaks above a threshold. When something comes in, display an animation. But there are many more applications.

0x3W commented 6 years ago

@mrocklin Crypto exchanges offer streaming data from sockets, sub-second level and no one gets angry :)

Here is jupyter notebook for Poloniex (DIY solution, initial stream is current order book, later quotes/traders) and Binance (official package, needs API keys).

I wanted to use it exactly for it but have issue with basic example.