Open fileunderjeff opened 7 years ago
@fileunderjeff - City of Chicago and it's local civic tech scene worked on a similar project to forecast when E. coli levels would be elevated in Lake Michigan.
If it's similar to Chicago, it's unlikely that there is a real-time stream because measuring bacteria levels requires culture-based methods to grow bacteria, which takes up to 16 hours. It's more likely that a statistical model--likely developed by USGS--is used to forecast bacteria levels for a particular day, which is based on historical data, to determine if there is a chance it exceeds recommended "safe" levels. These models, however, tend to be unstable and not accurately forecast days with high levels of bacteria.
We developed a statistical model to forecast elevated E. coli levels that seems to be more accurate and stable. Hopefully this model can be useful elsewhere, such as Houston. You can also look at the project notebook to see some context. We would be happy to discuss the model as well, which may be easier than rummaging through source code and disparate notes.
Nevertheless, I would encourage finding a business expert to understand how bacteria levels are monitored and the business rules to determine elevated levels. That significantly helped our project and to understand where data can helpful.
thanks @tomschenkjr!
Just to follow-up, @fileunderjeff. We just published a paper on this approach. If this is still in discussion, the paper may be a useful lead-in for any expert.
@tomschenkjr that's amazing! great work. I can't wait to check it out!
Using real time data from the USGS and other sources, develop an alert system for when water bacteria levels are unusually high.
Need to find and add datasets/feeds.