Closed breznak closed 9 years ago
All my tools have been in matplotlib, which makes me want to die. But this could be updated to do even more generic plotting? https://github.com/numenta/nupic/blob/master/examples/opf/clients/hotgym/anomaly/one_gym/nupic_anomaly_output.py
Thanks, I'll check your script soon.
Meanwhile I've tested the in browser
plotting on NAB and it looks awesome :+1: Maybe we could bring in over to nupic and just edit it to parse OPF result files ?
NAB has also a plotter using https://plot.ly/ which I haven't tested yet (need to set up an API key)
@rhyolight who's maintaining the plotting scripts for NAB?
CC @subutai @BoltzmannBrain @dundalek Hey! It's me, Mark :wink: Can you suggest a nice (and simple) JS lib for visualizing CSV graph data?
I'm a big fan of http://dygraphs.com/. That's what River View uses.
Matt Taylor OS Community Flag-Bearer Numenta
On Fri, Oct 9, 2015 at 2:22 PM, breznak notifications@github.com wrote:
CC @subutai https://github.com/subutai @BoltzmannBrain https://github.com/BoltzmannBrain @dundalek https://github.com/dundalek Hey! It's me, Mark [image: :wink:] Can you suggest a nice (and simple) JS lib for visualizing CSV graph data?
— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-146988839.
@breznak plotly has some great functionality for creating visually-appealing graphs, and for free. We use it in NAB here. Here's an example from that script, where the green and red diamonds mark TP and FP detections, respectively, and the red dots are the true anomaly labels. And if you navigate to the "Code" tab, plotly will give you all the code to generate the plot :smile:
@rhyolight @BoltzmannBrain Yes, I've edited the issue with both. DyGraphs are used in 1 approach. The plot.ly graph looks really pretty, but the fact it publishes all the data you plot (unless you pay) is a blocker for some of our purposes - ei. we can't publish the data.
So what secret supervillain project are you working on?
Matt Taylor OS Community Flag-Bearer Numenta
On Fri, Oct 9, 2015 at 3:46 PM, breznak notifications@github.com wrote:
@rhyolight https://github.com/rhyolight @BoltzmannBrain https://github.com/BoltzmannBrain Yes, I've edited the issue with both. DyGraphs are used in 1 approach. The plot.ly graph looks really pretty, but the fact it publishes all the data you plot (unless you pay) is a blocker for some of our purposes - ei. we can't publish the data.
— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-147005271.
So what secret supervillain project are you working on?
Nah, Skynet and stuff. :wink: And some medical records..
I've updated my initial steps in the working branch https://github.com/breznak/nupic/tree/plot_results
I'm looking for a JS/DyGraph
wizard who would be so kind in helping me with this issue :sos: as I think it'll be helpful to have for the upcoming HTM Challenge. ( @rhyolight @jefffohl ?)
I haven't tinkered with DyGraph before, but it sounds like the kind of thing that is up my alley. I might be able to clear out some time later this week to take a look.
Jeff, that would be awesome! I'll prepare today some example data so it's in a runable form. Thanks a bunch
@breznak - will this be used in your HTM Challenge submission? Or is it simply an enhancement to NuPIC that everyone can use? If the former, I probably shouldn't help out, unfortunately, because I am on the HTM Challenge judging panel, and it would create a conflict of interest.
@rhyolight - thoughts here?
@jefffohl If you are doing work that contributes to the NuPIC codebase, that is fine. Just don't help out by contributing to anyone's private project repos.
OK - thanks @rhyolight
@breznak https://github.com/breznak - will this be used in your HTM Challenge submission? Or is it simply an enhancement to NuPIC that everyone can use? If the former, I probably shouldn't help out, unfortunately, because I am on the HTM Challenge judging panel, and it would create a conflict of interest.
@jefffohl No worries, I haven't joined a Challenge project yet. This is intended for general Nupic enhancement, but I'd love to make it asap before the challenge as I see it could be useful to some people during their work on HTM challenge projects.
@breznak - OK, sounds great. Just trying to be careful about my responsibilities.
I've updated the TODO steps and ordered them sequentially
Meanwhile I've tested the in browser plotting on NAB and it looks awesome Maybe we could bring in over to nupic and just edit it to parse OPF result files ?
No one is really maintaining this right now. It does look it doesn't plot the results files correctly anymore. It would be nice if someone could fix that. The UI is rather cumbersome too. It does work fine for examining the raw data files.
Working on this now. First, I am evaluating some different plotting libraries. Let me know if anyone is tied to DyGraphs.
@rhyolight - when you were choosing a data visualization tool for River View, did you take a look at D3.js?
I am kind of leaning towards D3.js because it is more of a low-level framework, and therefore is more flexible. It occurs to me that, although we are solving a specific problem here, we are likely to want to continue to build various ways to visualize various things - therefore having a framework (rather than a tool) might be a better choice.
Does anyone have strong opinions here?
@jefffohl no strong opinions, as he who writes the code decides :wink: And both frameworks look really pretty to my amateur eye. But I don't see what else than OPF result graphs would we want to visualize in nupic? Remember there is the NuStudio to show you graphical representations of HTM and its parts..
I have using D3 directly in the past and it is an excellent library.
Sent from my MegaPhone
On Oct 16, 2015, at 4:14 PM, Jeff Fohl notifications@github.com wrote:
@rhyolight - when you were choosing a data visualization tool for River View, did you take a look at D3.js?
I am kind of leaning towards D3.js because it is more of a low-level framework, and therefore is more flexible. It occurs to me that, although we are solving a specific problem here, we are likely to want to continue to build various ways to visualize various things - therefore having a framework (rather than a tool) might be a better choice.
Does anyone have strong opinions here?
— Reply to this email directly or view it on GitHub.
@jefffohl I'll probably start looking into this too, can you please share if you had some progress? And if/why do you see D3.js better than DyGraph?
@breznak - sorry about my slow progress - I have to squeeze this work in with my day job :).
I was considering D3.js only because it is more of a framework, so that might serve us better in the future, if we intend to do more. But, for now, it seems that DyGraph will probably be the easiest to work with, so that seems the best bet at the moment. If we want to get more fancy later, we can.
I haven't done much except for reviewing charting frameworks, and getting up to speed with NAB.
I was going to spend some time today working on this, but if you want to divide up tasks, we could do that too.
Thanks Jeff, no worries. I just wanted to use your experience to know which framework I should start learning about..so dygraph it is for now. I will be learning my ways around, and if I started doing some real work, I'll sync with you here so we don't duplicate.
PS: can I help something with NAB? But I think we don't need it here..(?)
@breznak - ok. I am going to keep at it, so let me know if you start to work on any of the issues on your checklist.
Regarding NAB - I just wanted to see how NAB fit into OPF - so not intending to use it, just wanted to understand better where you are coming from in terms of what your needs are.
@breznak I am making some progress - I might have something to share tomorrow.
HI @breznak. I have put together something. Right now, it is an interface that shows two panes: data and results. For each, there is a select menu that allows the user to select which CSV file they want to plot.
Is this what you are imagining, or are you thinking that the data should be merged into one graph?
I can make a pull request if you want to see the work in progress.
Here is a screen shot for reference:
Hi @jefffohl , it looks really nice! :+1: :smile: Please make a PR, or tell me your branch, I'd love to test it out.
Agreed, it DOES look nice!
On Fri, Oct 23, 2015 at 4:57 PM, breznak notifications@github.com wrote:
Hi @jefffohl https://github.com/jefffohl , it looks really nice! [image: :+1:] [image: :smile:] Please make a PR, or tell me your branch, I'd love to test it out.
— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-150702560.
With kind regards,
David Ray Java Solutions Architect
Cortical.io http://cortical.io/ Sponsor of: HTM.java https://github.com/numenta/htm.java
d.ray@cortical.io http://cortical.io
Here you go: https://github.com/breznak/nupic/pull/9
A couple of questions/ideas to discuss:
data/results
in separate windows. {with, coma-separated, fields, inside brackets}
? annotated anomaly labels
?btw @jefffohl I've added some UI questions (designated by "UI") to the issue description, esp. fixing the anomalyScore scale would be nice. Now back to replying to your comment...
- I can code it up to merge the data and results data sets. The only question I have is, how to make sure we are loading the proper files? Will the paths to the data and the results always have reliable relative paths? Or, should we let the user choose a data file (based on a manifest file as we have now), then a results file, then merge them together after they have been selected?
The file lists are used in NAB which relies on the structure of the data, so I think it's safe to assume the paths will be correct. Actually NAB has an alternate plotting through plot.ly
so I would not worry about that functionality too much. And your commit now fixes the issue for NAB where plotting of results didn't work.
...if we focus on NuPIC/OPF, there are no separate data/results file lists. The results are in a single file (and raw data are included under the field actual
).
visualization$ python -m SimpleHTTPServer 12345 ../examples/my/path.csv
- the checkboxes are dynamically generated based on the CSV header. so, they will work with any CSV.
:guitar: :+1:
- The DyGraph CSV parser choked on the OPF format. I can write a Javascript parser for that. I am assuming that we want it to strip out all fields that are not a number?
yes, but..
Since the second line indicates the data type, this should be relatively easy. Or do we want to somehow convert some of the string values to numbers in some way? Note that DyGraph can only plot numbers (integers or floats).
Plotting only numbers is OK, the data-type line (2nd) would not work that easily, as for the example OPF file here the anomalyScore
is type string:
{.*}
or []
to 0
;PS: optionally, in the future we may want to get some of the other string fields, eg to get multiStepBestPredictions.1,multiStepBestPredictions.5
(which are in fact again floats)
- You can zoom into the graph by using your cursor to select a section of the graph. Though, I just noticed that I introduced a bug here. Normally, double-clicking will return the graph to the zoomed-out state. But I accidentally disabled that with some code that displays the timestamp when you click on the graph. I will fix that.
Cool thanks, I've noticed the zoom-out problem. Btw, do you think it would be possible to do some "smooth zoom-out" (the way zoom-in works)? With a mouse-wheel, a scroll-bar, ...?
- Unfortunately, I am ignorant about annotated anomaly labels. Can you elaborate on that point?
This exists only in NAB so far *). Annotations is a TXT file with a vector, where 1
means a human-annotated anomaly = nice to plot with a "stem". Maybe another "Choose annotations file:" file input field would work fine here? A nice to have but definitely not needed right now.
*) @rhyolight would it be any problem/benefit getting an anomalyAnnotation
field to OPF? Useful here and then NAB could use OPF files, instead of its custom. I've raised that issue there but got no opinions..
@jefffohl please let me know if there's anything I can help, with a non-JS stuff.
anomalyScore
, multiStepBestPredictions.1
, and multiStepBestPredictions.5
and the like? That would make it pretty easy, as long as we can count on those fields always being number-like (we can use type coercion to turn them into numbers). It makes things more tightly coupled, but so far, this would only be for OPF, so that should be OK, right?@jefffohl please let me know if there's anything I can help, with a non-JS stuff.
Will do. Thanks!
- In determining how to let the user choose what OPF file to use, all of those sound like good suggestions. It depends mostly on how users typically use the OPF, and about this I am somewhat ignorant. The most universally easy solution would be to launch a file browser from the web UI. We could also allow users to put in a path manually, which could be a publicly accessible URL. So, maybe there would be a "path" field, where the user types in the path, and a button for launching a file system browser for finding and loading local files.
I think a typical use case is a user running a single OPF experiment and wants to see the results.
So a a file browser from the web UI.
sounds like the way to go to me.
Alternatively later we could add behavior like Gmail attachements, "Add a file" file-chooser and then checkboxes for which files to plot.
- For parsing OPF files, would it be OK to hard-code into our script exceptions for
anomalyScore
,multiStepBestPredictions.1
, andmultiStepBestPredictions.5
and the like? That would make it pretty easy, as long as we can count on those fields always being number-like (we can use type coercion to turn them into numbers). It makes things more tightly coupled, but so far, this would only be for OPF, so that should be OK, right?
Agree, simple & works.
anomalyScore
, multiStepBestPredictions.actual
are always present and a float. multiStepBestPredictions.1
is always present but None
at the 1st row.multiStepBestPredictions.N
- optional, can be more (3,5,10), but not very typical. None till N+1-th row. I still like your approach to plotting all numeric fields (as input data will be plotted if it is in a suitable form), are you planning combining the 2 approaches?
- For smooth zooming, it don't see an option for that in DyGraphs, but I will look around, and I may be able to add a plugin for that as well.
(btw, just to make myself clear, I think the zoom-in with mouse selection is perfectly viable, just missing a zoom out step).
I just found a mention of https://code.google.com/p/dygraphs/issues/detail?id=366 in Issue 58 some calls could exist (?)
- Will worry about the anomalyAnnotation later. I think what I will end up having is two different versions of this script - though working in the same way - one for NAB, and one for OPF.
100% agreed. Personally I'd try to push NAB to switch to using OPF...
@breznak could you please elaborate on what you mean by pushing NAB to use OPF?
@breznak could you please elaborate on what you mean by pushing NAB to use OPF?
@BoltzmannBrain bad wording, sorry. I had an issue in NAB whether it could use OPF for its format (extended with a annotations column), so this code could be shared (eg the results plotting is fixed here)
Guys, this issues is just for plotting experiment results after they've been run, right? There is no initiative to visualize live predictions / anomalies coming out of NuPIC, is there?
I'd like to visualize the results
.csv
from running OPF experiment. It should be relatively easy, but perhaps some of you have already written a nice utility for that? NAB, @rhyolight ?WORKING branch: https://github.com/breznak/nupic/tree/plot_results
TODO:
NUPIC/scripts/visualization/
plot.lypublishes all data online, I think this is unacceptable for general use for usopf_results/*.csv
data_file* results_file* data/ results/
DyGraph
https://github.com/numenta/NAB/blob/master/nab_visualizer.html#L4-L7 -- @jefffohl likedD3.js
for being more flexible, but decided to stick withDyGraph
so far as it fits the current needs of plotting graphs.The current JS code from NAB spends a lot of work in parsing a structure of files, in order to plot all *.csv files there. For OPF/Nupic we don't need that (although it wouldn't hurt to keep the functionality for NAB).
What we need is the ability to
plot a single file provided as an argument
(python plot.py ./opf_results/DefaultTask.csv
)NAB uses separate folders for paths & results, on OPF results file, the all these are together
actual
anomalyScore
multistepBestPrediction.1
(anomalyScore, >0.9) AND (annotation, >0.9), green # correctly detected anomaly
(anomalyScore, <=0.9) AND (annotation, >0.9), red # missed
(anomalyScore, >0.9) AND (annotation, <=0.9), yellow # false positive
timestamp
field would show up in the "fields div on the left", as interactive values are shown there.) see https://github.com/breznak/nupic/pull/20If the above succeeds, maybe NAB should switch to using OPF format for the output (data+results) ( @subutai ?)
anomalyScore
has a string type in OPF (maybe bcs it's None at the 1st step)anomalyAnnotations
OPF field, to mark human annotations, could be useful in NABscalarValue
member, can we expose that (add a fieldXX.scalar
for each input field to OPF file) somehow?currently some fields are hard-coded (eg for "scaled anomaly score") and the rendering fails if the fields are not present. But a nice feature is the dynamic menu for plotting numeric fields, it would be nice if only the "scaled" function failed (and its checkbox is greyed-out) instead of all plot failing.
would require: a) the model sends the updates to the server; b) a network in/out region; c) refreshing the plot of the (updated) file every second or so..
UPDATES: