Visualize, plot OPF experiment results

breznak commented 9 years ago

I'd like to visualize the results .csv from running OPF experiment. It should be relatively easy, but perhaps some of you have already written a nice utility for that? NAB, @rhyolight ?

WORKING branch: https://github.com/breznak/nupic/tree/plot_results

TODO:

[x] move or copy the script from NAB to NUPIC/scripts/visualization/
[x] Offline & Security
- ~~plot.ly~~ publishes all data online, I think this is unacceptable for general use for us
- "our script" is pretty nice, runs in browser on localhost
- [x] I suggest downloading the rendering scripts for offline use
[x] add example data for easy plotting test + doc
- [x] OPF results data - opf_results/*.csv
- [x] NAB (small) data - data_file* results_file* data/ results/
[x] eval which plotting lib to use, currently it's DyGraph https://github.com/numenta/NAB/blob/master/nab_visualizer.html#L4-L7 -- @jefffohl liked D3.js for being more flexible, but decided to stick with DyGraph so far as it fits the current needs of plotting graphs.
[x] extend to plot OPF data, not only NAB (subset of OPF) data (Help wanted, I can explain what to do with OPF data, provide parser script)
- [x] plot single file
  The current JS code from NAB spends a lot of work in parsing a structure of files, in order to plot all *.csv files there. For OPF/Nupic we don't need that (although it wouldn't hurt to keep the functionality for NAB).
  What we need is the ability to plot a single file provided as an argument (python plot.py ./opf_results/DefaultTask.csv)
- [x] merge data and results
  NAB uses separate folders for paths & results, on OPF results file, the all these are together
[ ] improve the Plot page interface (Help wanted)
- [x] plot data, add a checkbox option, the OPF field is actual
- [x] plot anomaly, add a checkbox option, the OPF field is anomalyScore
- [x] plot predicted field, add a checkbox option, the OPF field is multistepBestPrediction.1
- [x] optional, add option to plot other multistep (>1) prediction fields
- [x] opt, add textfield to enter specific label that should be plotted, maybe can fit ^^^
- checkboxes generated dynamically for all CSV labels
- [ ] preselect some values -- impossible to preselect/know data - but anomalyScore & multiStepBestPredictions.actual make sense
- [ ] opt, plot labeled anomalies (not present in OPF, could be just a specific field)
- [x] UI: merge data & results windows?
- [x] UI: anomalyScore [0..1.0] is not noticable with "raw data" plotted, FIX by rescaling (to say 90% of max of the data)?
- [ ] UI: smooth zoom out. Now can soothly zoom in be selecting the are of interest. and zoom out by mouse-click to the original size. Can we zoom-out iteratively as well (mouse wheel/a scroll-bar, ...)?
- [ ] UI: add highlight menu: check-boxes fields (ex. anomalyScore), threshold (0.9), (optionally: below/over), (opt.: color) and highlight the section where the field is over the threshold.
- [ ] Consider an AND operator for the 2 above statements, can be used as evaluation as follows: see https://github.com/breznak/nupic/pull/16
  (anomalyScore, >0.9) AND (annotation, >0.9), green # correctly detected anomaly
  (anomalyScore, <=0.9) AND (annotation, >0.9), red # missed
  (anomalyScore, >0.9) AND (annotation, <=0.9), yellow # false positive
[ ] UI: On higher zoom-out levels (~1 week) it is not possible to see time (fine-grained) on x-axis interactively. (I think best solution would be if the timestamp field would show up in the "fields div on the left", as interactive values are shown there.) see https://github.com/breznak/nupic/pull/20
[x] opt, FIX plotting of NAB Results, currently only data seem to work. (Help wanted)
If the above succeeds, maybe NAB should switch to using OPF format for the output (data+results) ( @subutai ?)
[ ] Extending OPF: issues not directly for this PR, related to OPF; @rhyolight ?
- [ ] anomalyScore has a string type in OPF (maybe bcs it's None at the 1st step)
- [ ] new anomalyAnnotations OPF field, to mark human annotations, could be useful in NAB
- [ ] Plotting non-numeric inputs. Currently impossible, but each encoder has a scalarValue member, can we expose that (add a field XX.scalar for each input field to OPF file) somehow?
[x] Bugs
- [x] selecting a non-OPF csv crashes the web app, can't render after reselecting a correct OPF file (w/o server restart) @jefffohl
- [x] fields with too long name (eg metrics) get shortened in the UI, but it's not possible to see the whole name. Maybe a "context label" on mouse hover?
- [x] "generic CSV" support (NAB,..)
  currently some fields are hard-coded (eg for "scaled anomaly score") and the rendering fails if the fields are not present. But a nice feature is the dynamic menu for plotting numeric fields, it would be nice if only the "scaled" function failed (and its checkbox is greyed-out) instead of all plot failing.
- [x] additionally, if the "scaled" functionality could be written more generic (an array for labels to scale) so we could scale both anomaly & likelihood score, or spiketrain (0/1) data,... ?
- [x] /low priority/ Does not render in FireFox (nor did the orginal NAB code, but the DyGraph examples work fine in a recent FF)
[ ] New Features
- [ ] Online plotting
  would require: a) the model sends the updates to the server; b) a network in/out region; c) refreshing the plot of the (updated) file every second or so..
[ ] PR ready code
- [ ] finish main Readme, add images, maybe a wiki
- [ ] mention the "2 rows skipped" feature/problem
- [x] add comments/docs to functions
- [x] avoid hard-coded values, allow "settings" at code-level
- [ ] should we add test-case for this code?

UPDATES:

4/11/2015 - some refactoring, configurable options, improved "generic CSV" handling, documentation
3/11/2015 - Merged Jeff's PR with Bugfixes & overhauled usability! :clap: "Zoom view", and much more!
27/10/2015 - Merged Jeff's work enabling OPF files plotting, improving UI
24/10/2015 - Updated with @jefffohl 's "Initial commit" work, plots NAB data & results, improved checkboxes for plots

jefffohl commented 8 years ago

@breznak I noticed that you removed "timestamp" from the excludes array used in the generateFieldMapfunction. We need to add this back in, or else we end up with 2 instances of "timestamp" in that array, because we have to unshift that item into the array at the end of the routine. This is because "timestamp" has to be at the beginning of the returned array.

breznak commented 8 years ago

I noticed that you removed "timestamp" from the excludes array

@jefffohl yes, sorry. It worked and I wasn't aware that it would double the column. You can just change to var EXCLUDE_FIELDS = [TIMESTAMP]; to get it back. Will you do that, or should I?

jefffohl commented 8 years ago

@breznak - I can do it. Though, it depends on what your intentions are for EXCLUDE_FIELDS. If it is to make a list of fields that the user might want to exclude, then it might make more sense to hard-code it into the generateFieldMap function.

breznak commented 8 years ago

If it is to make a list of fields that the user might want to exclude ...

@jefffohl I thought that was the meaning. If we hard-code it, will it still be possible to plot the "timestamp" column? (usually it's linear, although there can be jumps so it can be interesting being able to plot it too)

jefffohl commented 8 years ago

@breznak - In DyGraphs, the x-axis is always the timestamp. So, how would one plot the timestamp against the timestamp?

breznak commented 8 years ago

..ah, you got me :grinning:

jefffohl commented 8 years ago

New PR here: https://github.com/breznak/nupic/pull/15

breznak commented 8 years ago

@jefffohl merged, thank you!

breznak commented 8 years ago

Another thing I got stuck at https://github.com/breznak/nupic/pull/16 - a work to be able to highlight certain ranges when a field is over a threshold.

breznak commented 8 years ago

Small but needed doc in https://github.com/breznak/nupic/pull/17

breznak commented 8 years ago

some graph options in https://github.com/breznak/nupic/pull/19

breznak commented 8 years ago

@jefffohl in testing the UTC/local time, I've noticed an annoying usability problem: In the hotgym file, if you zoom to 1 week range, the daily trends are nicely visible, but the x-axis shows only days, so it's hard to get time for an exact point (one has to click it and see a pop-up window). So:

[ ] UI: On higher zoom-out levels (~1 week) it is not possible to see time (fine-grained) on x-axis interactively. (I think best solution would be if the timestamp field would show up in the "fields div on the left", as interactive values are shown there.)

breznak commented 8 years ago

@jefffohl a couple of another ideas:

timestamp we treat it as a special field as it's used as x-axis, to OPF it's just a normal field (only always present). Can we use iteration for x-axis instead and treat timestamp like any other field? (if it parses it will be included, no requirement on exact name/presence, no fallback hacks if parsing fails,... ) The disadvantage (?) would be linear iteration step (actually I'd think this is better - so in case there are jumps in timestamp we'll notice that, currently we cannot).

breznak commented 8 years ago

choice Input: OPF/CSV/NAB to solve the "disadvantage" above, we could add a choice of input type (3 radio-choices right of the Render btn ?), that would simplify the code quite a lot:
- for OPF: timestamp always called timestamp, is string, used as x-axis(?)
- generic CSV: no requirements
- NAB /optional/ could even start the webApp in an iframe below

breznak commented 8 years ago

@rhyolight looks like we're getting ready! If you have time, could you give this a shot? (and possibly improve the visualization/Readme ?)

rhyolight commented 8 years ago

@breznak @jefffohl Great work. I really like this. I tried it out and it works great. IMO you should merge with the current feature set, then email nupic-discuss with an explanation of how to use. See if any more feature requests come in.

@breznak Go ahead and create a PR and I'll review. But I'm not sure I'll have time to update the readme but I'll try.

jefffohl commented 8 years ago

I can work on the README. Or is that something only the Flag Bearer is supposed to do?

rhyolight commented 8 years ago

@jefffohl Well there is no need to add to the main repo README. But the one in scripts/visualization should be created / updated.

jefffohl commented 8 years ago

@rhyolight - yes, I was referring to the README in scripts/visualization

jefffohl commented 8 years ago

@breznak do you feel that the current state of breznak/nupic/plot_results is what you would like to see for a first PR? If not, what features do you feel are needed? My understanding is that the outstanding features that in development are:

Allow user to select timestamp field, or use the iteration step as the timestamp field
Allow the user to select a threshold for each series. When this threshold is reached, the graph is highlighted.
Show timestamp in labeled values (to the right of the graph)

Anything else?

jefffohl commented 8 years ago

@rhyolight - I have some tooling questions too. Should we be adding some unit tests here, and if so, do we have any existing Javascript testing frameworks in place within NuPIC? Also, if we are going to continue to build on this web app, I would like to start implementing some build tools to help with development (such as using gulp for build, SaSS for compiling CSS, bower for managing packages, etc.). This then forces the question - should this be inside NuPIC - or should it be its own project? (Since there is no dependency on NuPIC within this app)

breznak commented 8 years ago

Yes @jefffohl , I'm very satisfied with the current state of plot_results; it's more than what I've wanted at the beginning. Are you happy with the current state, OK for a PR?

Those features in development are other enhancements, so not blocking the current state. I'll just reword them in my words:

improved 'generic CSV support' - timestamp would not be enforced anymore. Allow to toggle between time/step as x-axis. (Enhancement, https://github.com/breznak/nupic/pull/20 , almost complete-only needs the UI checkbox to function properly)
highlighting subsections over the threshold; (and further development for AND-conditions) (New feature, https://github.com/breznak/nupic/pull/16 , early devel)
for each selected (mouse hover) point show somewhere x-axis value (Enhancement)
allow to plot string/category data (New feature, in https://github.com/breznak/nupic/pull/20, 90% ready - need to fix the labels/columns count difference)

rhyolight commented 8 years ago

You are writing the first JavaScript I'm aware of in NuPIC. If you plan on continuing, I would be more in favor of breaking out your plotting tool into its own repository and adding tests there.

The reason is I don't really want to add a new JavaScript runtime to a Python project just to run tests. I think it would be cleaner to have it in its own repo, especially because of its generic functionality.

I say create a PR for now, but if this gets any bigger, move to another repo in nupic-community.

Matt Taylor OS Community Flag-Bearer Numenta

On Thu, Nov 5, 2015 at 11:54 AM, Jeff Fohl notifications@github.com wrote:

@rhyolight https://github.com/rhyolight - I have some tooling questions too. Should we be adding some unit tests here, and if so, do we have any existing Javascript testing frameworks in place within NuPIC? Also, if we are going to continue to build on this web app, I would like to start implementing some build tools to help with development (such as using gulp for build, SaSS for compiling CSS, bower for managing packages, etc.). This then forces the question - should this be inside NuPIC - or should it be its own project? (Since there is no dependency on NuPIC within this app)

— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-154172255.

breznak commented 8 years ago

About the build tools - I can't judge how much they are needed, but I wouldn't be for extra dependencies to NuPIC, same for tests. This means something like nupic.visualizations repo would be nice(?) The nice thing now is you can fire off a browser and view a nupic results, so the new repo would have to be easily distributable - pip (?)

jefffohl commented 8 years ago

Sounds like we are agreed then. I will update the README, and we can make a PR [EDIT: @breznak already made a PR :)]. For further development, I can create a new repo, with a proper set of build tools, tests, etc.

jefffohl commented 8 years ago

I am not sure what the best way to distribute this would be, but pip seems like a strange fit - isn't that for Python packages only? For front-end apps like this, we would usually use something like Bower. Though, most of the people in this community are probably not avid users of Bower. Of course, it should be pretty easy for people to just use git to clone the repo.

breznak commented 8 years ago

FTR..the work has moved to its stand-alone community repo here: https://github.com/nupic-community/nupic.visualizations

numenta / nupic-legacy

Visualize, plot OPF experiment results #2658