Visualize, plot OPF experiment results

breznak commented 9 years ago

I'd like to visualize the results .csv from running OPF experiment. It should be relatively easy, but perhaps some of you have already written a nice utility for that? NAB, @rhyolight ?

WORKING branch: https://github.com/breznak/nupic/tree/plot_results

TODO:

[x] move or copy the script from NAB to NUPIC/scripts/visualization/
[x] Offline & Security
- ~~plot.ly~~ publishes all data online, I think this is unacceptable for general use for us
- "our script" is pretty nice, runs in browser on localhost
- [x] I suggest downloading the rendering scripts for offline use
[x] add example data for easy plotting test + doc
- [x] OPF results data - opf_results/*.csv
- [x] NAB (small) data - data_file* results_file* data/ results/
[x] eval which plotting lib to use, currently it's DyGraph https://github.com/numenta/NAB/blob/master/nab_visualizer.html#L4-L7 -- @jefffohl liked D3.js for being more flexible, but decided to stick with DyGraph so far as it fits the current needs of plotting graphs.
[x] extend to plot OPF data, not only NAB (subset of OPF) data (Help wanted, I can explain what to do with OPF data, provide parser script)
- [x] plot single file
  The current JS code from NAB spends a lot of work in parsing a structure of files, in order to plot all *.csv files there. For OPF/Nupic we don't need that (although it wouldn't hurt to keep the functionality for NAB).
  What we need is the ability to plot a single file provided as an argument (python plot.py ./opf_results/DefaultTask.csv)
- [x] merge data and results
  NAB uses separate folders for paths & results, on OPF results file, the all these are together
[ ] improve the Plot page interface (Help wanted)
- [x] plot data, add a checkbox option, the OPF field is actual
- [x] plot anomaly, add a checkbox option, the OPF field is anomalyScore
- [x] plot predicted field, add a checkbox option, the OPF field is multistepBestPrediction.1
- [x] optional, add option to plot other multistep (>1) prediction fields
- [x] opt, add textfield to enter specific label that should be plotted, maybe can fit ^^^
- checkboxes generated dynamically for all CSV labels
- [ ] preselect some values -- impossible to preselect/know data - but anomalyScore & multiStepBestPredictions.actual make sense
- [ ] opt, plot labeled anomalies (not present in OPF, could be just a specific field)
- [x] UI: merge data & results windows?
- [x] UI: anomalyScore [0..1.0] is not noticable with "raw data" plotted, FIX by rescaling (to say 90% of max of the data)?
- [ ] UI: smooth zoom out. Now can soothly zoom in be selecting the are of interest. and zoom out by mouse-click to the original size. Can we zoom-out iteratively as well (mouse wheel/a scroll-bar, ...)?
- [ ] UI: add highlight menu: check-boxes fields (ex. anomalyScore), threshold (0.9), (optionally: below/over), (opt.: color) and highlight the section where the field is over the threshold.
- [ ] Consider an AND operator for the 2 above statements, can be used as evaluation as follows: see https://github.com/breznak/nupic/pull/16
  (anomalyScore, >0.9) AND (annotation, >0.9), green # correctly detected anomaly
  (anomalyScore, <=0.9) AND (annotation, >0.9), red # missed
  (anomalyScore, >0.9) AND (annotation, <=0.9), yellow # false positive
[ ] UI: On higher zoom-out levels (~1 week) it is not possible to see time (fine-grained) on x-axis interactively. (I think best solution would be if the timestamp field would show up in the "fields div on the left", as interactive values are shown there.) see https://github.com/breznak/nupic/pull/20
[x] opt, FIX plotting of NAB Results, currently only data seem to work. (Help wanted)
If the above succeeds, maybe NAB should switch to using OPF format for the output (data+results) ( @subutai ?)
[ ] Extending OPF: issues not directly for this PR, related to OPF; @rhyolight ?
- [ ] anomalyScore has a string type in OPF (maybe bcs it's None at the 1st step)
- [ ] new anomalyAnnotations OPF field, to mark human annotations, could be useful in NAB
- [ ] Plotting non-numeric inputs. Currently impossible, but each encoder has a scalarValue member, can we expose that (add a field XX.scalar for each input field to OPF file) somehow?
[x] Bugs
- [x] selecting a non-OPF csv crashes the web app, can't render after reselecting a correct OPF file (w/o server restart) @jefffohl
- [x] fields with too long name (eg metrics) get shortened in the UI, but it's not possible to see the whole name. Maybe a "context label" on mouse hover?
- [x] "generic CSV" support (NAB,..)
  currently some fields are hard-coded (eg for "scaled anomaly score") and the rendering fails if the fields are not present. But a nice feature is the dynamic menu for plotting numeric fields, it would be nice if only the "scaled" function failed (and its checkbox is greyed-out) instead of all plot failing.
- [x] additionally, if the "scaled" functionality could be written more generic (an array for labels to scale) so we could scale both anomaly & likelihood score, or spiketrain (0/1) data,... ?
- [x] /low priority/ Does not render in FireFox (nor did the orginal NAB code, but the DyGraph examples work fine in a recent FF)
[ ] New Features
- [ ] Online plotting
  would require: a) the model sends the updates to the server; b) a network in/out region; c) refreshing the plot of the (updated) file every second or so..
[ ] PR ready code
- [ ] finish main Readme, add images, maybe a wiki
- [ ] mention the "2 rows skipped" feature/problem
- [x] add comments/docs to functions
- [x] avoid hard-coded values, allow "settings" at code-level
- [ ] should we add test-case for this code?

UPDATES:

4/11/2015 - some refactoring, configurable options, improved "generic CSV" handling, documentation
3/11/2015 - Merged Jeff's PR with Bugfixes & overhauled usability! :clap: "Zoom view", and much more!
27/10/2015 - Merged Jeff's work enabling OPF files plotting, improving UI
24/10/2015 - Updated with @jefffohl 's "Initial commit" work, plots NAB data & results, improved checkboxes for plots

rhyolight commented 9 years ago

All my tools have been in matplotlib, which makes me want to die. But this could be updated to do even more generic plotting? https://github.com/numenta/nupic/blob/master/examples/opf/clients/hotgym/anomaly/one_gym/nupic_anomaly_output.py

breznak commented 9 years ago

Thanks, I'll check your script soon.

Meanwhile I've tested the in browser plotting on NAB and it looks awesome :+1: Maybe we could bring in over to nupic and just edit it to parse OPF result files ?

NAB has also a plotter using https://plot.ly/ which I haven't tested yet (need to set up an API key)

@rhyolight who's maintaining the plotting scripts for NAB?

breznak commented 9 years ago

CC @subutai @BoltzmannBrain @dundalek Hey! It's me, Mark :wink: Can you suggest a nice (and simple) JS lib for visualizing CSV graph data?

rhyolight commented 9 years ago

I'm a big fan of http://dygraphs.com/. That's what River View uses.

Matt Taylor OS Community Flag-Bearer Numenta

On Fri, Oct 9, 2015 at 2:22 PM, breznak notifications@github.com wrote:

CC @subutai https://github.com/subutai @BoltzmannBrain https://github.com/BoltzmannBrain @dundalek https://github.com/dundalek Hey! It's me, Mark [image: :wink:] Can you suggest a nice (and simple) JS lib for visualizing CSV graph data?

— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-146988839.

BoltzmannBrain commented 9 years ago

@breznak plotly has some great functionality for creating visually-appealing graphs, and for free. We use it in NAB here. Here's an example from that script, where the green and red diamonds mark TP and FP detections, respectively, and the red dots are the true anomaly labels. And if you navigate to the "Code" tab, plotly will give you all the code to generate the plot :smile:

breznak commented 9 years ago

@rhyolight @BoltzmannBrain Yes, I've edited the issue with both. DyGraphs are used in 1 approach. The plot.ly graph looks really pretty, but the fact it publishes all the data you plot (unless you pay) is a blocker for some of our purposes - ei. we can't publish the data.

rhyolight commented 9 years ago

So what secret supervillain project are you working on?

Matt Taylor OS Community Flag-Bearer Numenta

On Fri, Oct 9, 2015 at 3:46 PM, breznak notifications@github.com wrote:

@rhyolight https://github.com/rhyolight @BoltzmannBrain https://github.com/BoltzmannBrain Yes, I've edited the issue with both. DyGraphs are used in 1 approach. The plot.ly graph looks really pretty, but the fact it publishes all the data you plot (unless you pay) is a blocker for some of our purposes - ei. we can't publish the data.

— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-147005271.

breznak commented 9 years ago

So what secret supervillain project are you working on?

Nah, Skynet and stuff. :wink: And some medical records..

breznak commented 9 years ago

I've updated my initial steps in the working branch https://github.com/breznak/nupic/tree/plot_results I'm looking for a JS/DyGraph wizard who would be so kind in helping me with this issue :sos: as I think it'll be helpful to have for the upcoming HTM Challenge. ( @rhyolight @jefffohl ?)

jefffohl commented 9 years ago

I haven't tinkered with DyGraph before, but it sounds like the kind of thing that is up my alley. I might be able to clear out some time later this week to take a look.

breznak commented 9 years ago

Jeff, that would be awesome! I'll prepare today some example data so it's in a runable form. Thanks a bunch

jefffohl commented 9 years ago

@breznak - will this be used in your HTM Challenge submission? Or is it simply an enhancement to NuPIC that everyone can use? If the former, I probably shouldn't help out, unfortunately, because I am on the HTM Challenge judging panel, and it would create a conflict of interest.

@rhyolight - thoughts here?

rhyolight commented 9 years ago

@jefffohl If you are doing work that contributes to the NuPIC codebase, that is fine. Just don't help out by contributing to anyone's private project repos.

jefffohl commented 9 years ago

OK - thanks @rhyolight

breznak commented 9 years ago

@breznak https://github.com/breznak - will this be used in your HTM Challenge submission? Or is it simply an enhancement to NuPIC that everyone can use? If the former, I probably shouldn't help out, unfortunately, because I am on the HTM Challenge judging panel, and it would create a conflict of interest.

@jefffohl No worries, I haven't joined a Challenge project yet. This is intended for general Nupic enhancement, but I'd love to make it asap before the challenge as I see it could be useful to some people during their work on HTM challenge projects.

jefffohl commented 9 years ago

@breznak - OK, sounds great. Just trying to be careful about my responsibilities.

breznak commented 9 years ago

I've updated the TODO steps and ordered them sequentially

subutai commented 9 years ago

Meanwhile I've tested the in browser plotting on NAB and it looks awesome Maybe we could bring in over to nupic and just edit it to parse OPF result files ?

No one is really maintaining this right now. It does look it doesn't plot the results files correctly anymore. It would be nice if someone could fix that. The UI is rather cumbersome too. It does work fine for examining the raw data files.

jefffohl commented 9 years ago

Working on this now. First, I am evaluating some different plotting libraries. Let me know if anyone is tied to DyGraphs.

jefffohl commented 9 years ago

@rhyolight - when you were choosing a data visualization tool for River View, did you take a look at D3.js?

I am kind of leaning towards D3.js because it is more of a low-level framework, and therefore is more flexible. It occurs to me that, although we are solving a specific problem here, we are likely to want to continue to build various ways to visualize various things - therefore having a framework (rather than a tool) might be a better choice.

Does anyone have strong opinions here?

breznak commented 9 years ago

@jefffohl no strong opinions, as he who writes the code decides :wink: And both frameworks look really pretty to my amateur eye. But I don't see what else than OPF result graphs would we want to visualize in nupic? Remember there is the NuStudio to show you graphical representations of HTM and its parts..

rhyolight commented 9 years ago

I have using D3 directly in the past and it is an excellent library.

Sent from my MegaPhone

On Oct 16, 2015, at 4:14 PM, Jeff Fohl notifications@github.com wrote:

@rhyolight - when you were choosing a data visualization tool for River View, did you take a look at D3.js?

I am kind of leaning towards D3.js because it is more of a low-level framework, and therefore is more flexible. It occurs to me that, although we are solving a specific problem here, we are likely to want to continue to build various ways to visualize various things - therefore having a framework (rather than a tool) might be a better choice.

Does anyone have strong opinions here?

— Reply to this email directly or view it on GitHub.

breznak commented 9 years ago

@jefffohl I'll probably start looking into this too, can you please share if you had some progress? And if/why do you see D3.js better than DyGraph?

jefffohl commented 9 years ago

@breznak - sorry about my slow progress - I have to squeeze this work in with my day job :).

I was considering D3.js only because it is more of a framework, so that might serve us better in the future, if we intend to do more. But, for now, it seems that DyGraph will probably be the easiest to work with, so that seems the best bet at the moment. If we want to get more fancy later, we can.

I haven't done much except for reviewing charting frameworks, and getting up to speed with NAB.

I was going to spend some time today working on this, but if you want to divide up tasks, we could do that too.

breznak commented 9 years ago

Thanks Jeff, no worries. I just wanted to use your experience to know which framework I should start learning about..so dygraph it is for now. I will be learning my ways around, and if I started doing some real work, I'll sync with you here so we don't duplicate.

PS: can I help something with NAB? But I think we don't need it here..(?)

jefffohl commented 9 years ago

@breznak - ok. I am going to keep at it, so let me know if you start to work on any of the issues on your checklist.

Regarding NAB - I just wanted to see how NAB fit into OPF - so not intending to use it, just wanted to understand better where you are coming from in terms of what your needs are.

jefffohl commented 9 years ago

@breznak I am making some progress - I might have something to share tomorrow.

jefffohl commented 9 years ago

HI @breznak. I have put together something. Right now, it is an interface that shows two panes: data and results. For each, there is a select menu that allows the user to select which CSV file they want to plot.

Is this what you are imagining, or are you thinking that the data should be merged into one graph?

I can make a pull request if you want to see the work in progress.

jefffohl commented 9 years ago

Here is a screen shot for reference: screen shot 2015-10-23 at 1 35 23 pm

breznak commented 9 years ago

Hi @jefffohl , it looks really nice! :+1: :smile: Please make a PR, or tell me your branch, I'd love to test it out.

cogmission commented 9 years ago

Agreed, it DOES look nice!

On Fri, Oct 23, 2015 at 4:57 PM, breznak notifications@github.com wrote:

Hi @jefffohl https://github.com/jefffohl , it looks really nice! [image: :+1:] [image: :smile:] Please make a PR, or tell me your branch, I'd love to test it out.

— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-150702560.

With kind regards,

David Ray Java Solutions Architect

Cortical.io http://cortical.io/ Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io http://cortical.io

jefffohl commented 9 years ago

Here you go: https://github.com/breznak/nupic/pull/9

breznak commented 9 years ago

A couple of questions/ideas to discuss:

about the "separate data", what do you think? With the nice check-box options for what to plot, I see little importance in having data/results in separate windows.
the checkbox functionality for labels is really awesome! Is it hard-coded, or works for any CSV?
now it accepts the NAB data format, for NuPIC we need to focus on the OPF format too.
- will the CSV parsing library handle a cell {with, coma-separated, fields, inside brackets} ?
  I have a bash script to handle this, which I will try to convert to python if the library doesn't.
- the OPF CSV typically have a lot of labels, should we filter them (or hardcode some "regexp" to filter them?)
can I zoom in/out the graph, please? :wink:
only for NAB (or maybe later NuPIC), should we focus on plotting annotated anomaly labels?

jefffohl commented 9 years ago

I can code it up to merge the data and results data sets. The only question I have is, how to make sure we are loading the proper files? Will the paths to the data and the results always have reliable relative paths? Or, should we let the user choose a data file (based on a manifest file as we have now), then a results file, then merge them together after they have been selected?
the checkboxes are dynamically generated based on the CSV header. so, they will work with any CSV.
The DyGraph CSV parser choked on the OPF format. I can write a Javascript parser for that. I am assuming that we want it to strip out all fields that are not a number? Since the second line indicates the data type, this should be relatively easy. Or do we want to somehow convert some of the string values to numbers in some way? Note that DyGraph can only plot numbers (integers or floats).
You can zoom into the graph by using your cursor to select a section of the graph. Though, I just noticed that I introduced a bug here. Normally, double-clicking will return the graph to the zoomed-out state. But I accidentally disabled that with some code that displays the timestamp when you click on the graph. I will fix that.
Unfortunately, I am ignorant about annotated anomaly labels. Can you elaborate on that point?

breznak commented 9 years ago

btw @jefffohl I've added some UI questions (designated by "UI") to the issue description, esp. fixing the anomalyScore scale would be nice. Now back to replying to your comment...

breznak commented 9 years ago

I can code it up to merge the data and results data sets. The only question I have is, how to make sure we are loading the proper files? Will the paths to the data and the results always have reliable relative paths? Or, should we let the user choose a data file (based on a manifest file as we have now), then a results file, then merge them together after they have been selected?

The file lists are used in NAB which relies on the structure of the data, so I think it's safe to assume the paths will be correct. Actually NAB has an alternate plotting through plot.ly so I would not worry about that functionality too much. And your commit now fixes the issue for NAB where plotting of results didn't work.

...if we focus on NuPIC/OPF, there are no separate data/results file lists. The results are in a single file (and raw data are included under the field actual).

How to pass the OPF file path to the script?
- can we use the command-line argument with http server? visualization$ python -m SimpleHTTPServer 12345 ../examples/my/path.csv
- can we trigger a file chooser dialog from the webUI?
- stick with the file list, which the user will manually generate and will contain paths to all CSVs that should be plotted. (Probably useful to keep if we want to plot more files at once)
- have the manifest-file with a folder and then show all CSV under that path?

breznak commented 9 years ago

the checkboxes are dynamically generated based on the CSV header. so, they will work with any CSV.

:guitar: :+1:

breznak commented 9 years ago

The DyGraph CSV parser choked on the OPF format. I can write a Javascript parser for that. I am assuming that we want it to strip out all fields that are not a number?

yes, but..

Since the second line indicates the data type, this should be relatively easy. Or do we want to somehow convert some of the string values to numbers in some way? Note that DyGraph can only plot numbers (integers or floats).

Plotting only numbers is OK, the data-type line (2nd) would not work that easily, as for the example OPF file here the anomalyScore is type string:

this is a NuPIC bug and we should ignore it and require it fixed ( @rhyolight ? I think that would be a correct approach)
hack around, my parser did well with replacing {.*} or [] to 0;

PS: optionally, in the future we may want to get some of the other string fields, eg to get multiStepBestPredictions.1,multiStepBestPredictions.5 (which are in fact again floats)

breznak commented 9 years ago

You can zoom into the graph by using your cursor to select a section of the graph. Though, I just noticed that I introduced a bug here. Normally, double-clicking will return the graph to the zoomed-out state. But I accidentally disabled that with some code that displays the timestamp when you click on the graph. I will fix that.

Cool thanks, I've noticed the zoom-out problem. Btw, do you think it would be possible to do some "smooth zoom-out" (the way zoom-in works)? With a mouse-wheel, a scroll-bar, ...?

breznak commented 9 years ago

Unfortunately, I am ignorant about annotated anomaly labels. Can you elaborate on that point?

This exists only in NAB so far *). Annotations is a TXT file with a vector, where 1 means a human-annotated anomaly = nice to plot with a "stem". Maybe another "Choose annotations file:" file input field would work fine here? A nice to have but definitely not needed right now.

*) @rhyolight would it be any problem/benefit getting an anomalyAnnotation field to OPF? Useful here and then NAB could use OPF files, instead of its custom. I've raised that issue there but got no opinions..

breznak commented 9 years ago

@jefffohl please let me know if there's anything I can help, with a non-JS stuff.

jefffohl commented 9 years ago

In determining how to let the user choose what OPF file to use, all of those sound like good suggestions. It depends mostly on how users typically use the OPF, and about this I am somewhat ignorant. The most universally easy solution would be to launch a file browser from the web UI. We could also allow users to put in a path manually, which could be a publicly accessible URL. So, maybe there would be a "path" field, where the user types in the path, and a button for launching a file system browser for finding and loading local files.
For parsing OPF files, would it be OK to hard-code into our script exceptions for anomalyScore, multiStepBestPredictions.1, and multiStepBestPredictions.5 and the like? That would make it pretty easy, as long as we can count on those fields always being number-like (we can use type coercion to turn them into numbers). It makes things more tightly coupled, but so far, this would only be for OPF, so that should be OK, right?
For smooth zooming, it don't see an option for that in DyGraphs, but I will look around, and I may be able to add a plugin for that as well.
Will worry about the anomalyAnnotation later. I think what I will end up having is two different versions of this script - though working in the same way - one for NAB, and one for OPF. Alternatively, we could perhaps use one script, but give it a config somehow, so it behaves slightly differently depending on if it is working with OPF data, or NAB data.

jefffohl commented 9 years ago

@jefffohl please let me know if there's anything I can help, with a non-JS stuff.

Will do. Thanks!

breznak commented 9 years ago

In determining how to let the user choose what OPF file to use, all of those sound like good suggestions. It depends mostly on how users typically use the OPF, and about this I am somewhat ignorant. The most universally easy solution would be to launch a file browser from the web UI. We could also allow users to put in a path manually, which could be a publicly accessible URL. So, maybe there would be a "path" field, where the user types in the path, and a button for launching a file system browser for finding and loading local files.

I think a typical use case is a user running a single OPF experiment and wants to see the results. So a a file browser from the web UI. sounds like the way to go to me. Alternatively later we could add behavior like Gmail attachements, "Add a file" file-chooser and then checkboxes for which files to plot.

breznak commented 9 years ago

For parsing OPF files, would it be OK to hard-code into our script exceptions for anomalyScore, multiStepBestPredictions.1, and multiStepBestPredictions.5 and the like? That would make it pretty easy, as long as we can count on those fields always being number-like (we can use type coercion to turn them into numbers). It makes things more tightly coupled, but so far, this would only be for OPF, so that should be OK, right?

Agree, simple & works.

anomalyScore, multiStepBestPredictions.actual are always present and a float.
multiStepBestPredictions.1 is always present but None at the 1st row.
multiStepBestPredictions.N - optional, can be more (3,5,10), but not very typical. None till N+1-th row.

I still like your approach to plotting all numeric fields (as input data will be plotted if it is in a suitable form), are you planning combining the 2 approaches?

breznak commented 9 years ago

For smooth zooming, it don't see an option for that in DyGraphs, but I will look around, and I may be able to add a plugin for that as well.

(btw, just to make myself clear, I think the zoom-in with mouse selection is perfectly viable, just missing a zoom out step).

I just found a mention of https://code.google.com/p/dygraphs/issues/detail?id=366 in Issue 58 some calls could exist (?)

breznak commented 9 years ago

Will worry about the anomalyAnnotation later. I think what I will end up having is two different versions of this script - though working in the same way - one for NAB, and one for OPF.

100% agreed. Personally I'd try to push NAB to switch to using OPF...

BoltzmannBrain commented 9 years ago

@breznak could you please elaborate on what you mean by pushing NAB to use OPF?

breznak commented 9 years ago

@breznak could you please elaborate on what you mean by pushing NAB to use OPF?

@BoltzmannBrain bad wording, sorry. I had an issue in NAB whether it could use OPF for its format (extended with a annotations column), so this code could be shared (eg the results plotting is fixed here)

rhyolight commented 9 years ago

Guys, this issues is just for plotting experiment results after they've been run, right? There is no initiative to visualize live predictions / anomalies coming out of NuPIC, is there?

numenta / nupic-legacy

Visualize, plot OPF experiment results #2658