panoptes / PEAS

PANOPTES Environmental Analysis System
MIT License
1 stars 4 forks source link

stream data in real time #3

Open mimming opened 8 years ago

mimming commented 8 years ago

What we have now

sensor data is being dropped into Google Cloud Storage daily. A python script is slurping it up, and loading it into BigQuery

desired state

Create a Pub/Sub pipeline that accepts sensor data and follow this path:

Pub/Sub -> DataFlow (Beam) [as necessary] -> BigQuery
mimming commented 8 years ago

It looks like Dataflow for Python is still in alpha.

I'm going to wait until it's in beta before I move to it.

In the mean time, use this model to get batches:

Panoptes unit -> Cloud Storage bucket unit_sensors -> Python script on compute engine VM -> BigQuery