numenta / nupic-legacy

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
http://numenta.org/
GNU Affero General Public License v3.0
6.33k stars 1.56k forks source link

Visualize, plot OPF experiment results #2658

Closed breznak closed 8 years ago

breznak commented 8 years ago

I'd like to visualize the results .csv from running OPF experiment. It should be relatively easy, but perhaps some of you have already written a nice utility for that? NAB, @rhyolight ?


WORKING branch: https://github.com/breznak/nupic/tree/plot_results

TODO:


UPDATES:

jefffohl commented 8 years ago

Right now, the little web app I am working on just reads files that have already been created by the OPF. But, I was imagining that the next step would be to visualize streaming live data.

rhyolight commented 8 years ago

That's a good thing to think about, but it will require a server to post data to. The live data coming out of NuPIC will be from a python runtime, so it will need to be posted somewhere for display. Just something to think about.

breznak commented 8 years ago

I was imagining that the next step would be to visualize streaming live data.

Same. I was planning this for later. Actually if the plotting is reasonably fast, you could just reread the file every second, to get a semi-live data plot.

breznak commented 8 years ago

...so it will need to be posted somewhere for display.

In some other issue I was thinking about network I/O region (maybe somebody already implemented that? I remember some networking code(?)) The OPF model could just add a network-out region and the plotter would bind to a known address.... could work like that? (but not there yet)

jefffohl commented 8 years ago

I created a new PR. Notes here: https://github.com/breznak/nupic/pull/10

Is it better to include the notes here, or with the PR?

breznak commented 8 years ago

I created a new PR. Notes here: breznak#10

@jefffohl This is a HUGE progress, KUDOs! :+1: :100: This is everything what I've originally imagined for the visualizer.

I've merged your work, updated the check-points and tested, all works well here.

I am not sure if this approach is optimal or not. It might be better to hard-code certain fields to be included. Or perhaps this way is best. Would like feedback on that. In any case, consider this a work in progress, with this being the latest installment as we work towards a more polished app.

I would be more for the dynamic approach with plotting all numeric fields, as we don't know what the user might be interested in (for example, I found quite interesting plotting all the metrics and comparing their performance). A compromise might be preselecting some default check-boxes (anomalyScore, bestPredictions.actual)?

breznak commented 8 years ago

@jefffohl I've added some usability comments to the "Bugs" section, what do you think about them?

jefffohl commented 8 years ago

Thanks @breznak. I will take a crack at the bugs today. As you are more familiar with the OPF than I am, can you give me a list of what fields will always be found in an OPF file, as well as fields that may or may not be found there? I would like to auto-detect the file type, to try and determine if it is an OPF file, and if it is a valid OPF file.

jefffohl commented 8 years ago

@breznak - also, what fields besides anomalyScore do we imagine a user would like to scale proportionally to the data range?

breznak commented 8 years ago

can you give me a list of what fields will always be found in an OPF file, as well as fields that may or may not be found there? I would like to auto-detect the file type, to try and determine if it is an OPF file, and if it is a valid OPF file.

@jefffohl this is little tricky. An OPF file will have columns by the Model used, if you take "anomaly detection" as the intended use-case for the visualizations, the model has to be TemporalAnomaly (which inherits from TemporalMultiStep) and always has an anomalyScore field. (+ the field(s) from TemporalMultiStep).

A TemporalMultiStep will always have the multiStepBestPredictions.actual field, and depending on user's settings additional multiStepBestPredictions.<N> fields.

There is also always the timestamp, but it may not have to be in a date format always, can appear as an iteration (1,2,3,4,...).

Other models are not actively used AFAIK.

what fields besides anomalyScore do we imagine a user would like to scale proportionally to the data range?

Do you intend this to be OPF-only, or a generic "I will plot what I can from the CSV" implementation? If the latter, one of the use-cases I had on mind was comparing several anomaly implementations/settings. This happens in NAB results file (it could be plotted if we overcome the hard-coded names somehow? *1) ) which includes the raw_anomaly and anomaly_score (=likelihood) fields.

timestamp,value,anomaly_score,raw_score,label,S(t)_reward_low_FP_rate,S(t)_reward_low_FN_rate,S(t)_standard
2014-07-01 00:00:00,10844.0,0.0301029996659,1.0,0,0.0,0.0,0.0
2014-07-01 00:30:00,8127.0,0.0301029996659,1.0,0,0.0,0.0,0.0
2014-07-01 01:00:00,6210.0,0.0301029996659,1.0,0,0.0,0.0,0.0
2014-07-01 01:30:00,4656.0,0.0301029996659,1.0,0,0.0,0.0,0.0

*1) 2 ideas to approaching the "fixed names" problem: A) "in code":

NORMALIZE_TO_FIELD="multiStepBestPredictions.actual"
SCALE_FIELDS_ARR=["anomalyScore"] # can extend here

B) "in web GUI":

breznak commented 8 years ago

Btw, I've found a working example for the finer zoom in/out http://dygraphs.com/gallery/#g/interaction ("Custom interaction" block) but I'm still unable to transfer it here :/

jefffohl commented 8 years ago

Thanks @breznak.

I do want to make the app as flexible as possible, and therefore would like to enable it to consume whatever data is passed its way, with the only constraint being that there must be a timestamp field.

I like your idea for normalizing fields. The "fields" UI might start getting a bit crowded at this point, so we would probably want to allow the user to toggle this portion of the UI in and out of sight.

Regarding the zoom, feature, have you taken a look at the range-selector widget? http://dygraphs.com/gallery/#g/range-selector

jefffohl commented 8 years ago

@breznak do you have any ideas about how to render field names that are very long, such as this? multiStepBestPredictions:multiStep:errorMetric='aae':steps=5:window=1000:field=consumption

Can you think of a more concise label that still has meaning for the user? If we can come up with some kind of logic, I can convert them.

rhyolight commented 8 years ago

@breznak @jefffohl I would like to test this out. How do I do it?

jefffohl commented 8 years ago

@rhyolight

jefffohl commented 8 years ago

Oops - forgot this is on the main branch. Here is @breznak's branch: https://github.com/breznak/nupic/tree/plot_results

jefffohl commented 8 years ago

PS - you will want to test with an OPF output file, like the one found at nupic/scripts/visualization/opf_results/

breznak commented 8 years ago

I do want to make the app as flexible as possible, and therefore would like to enable it to consume whatever data is passed its way, with the only constraint being that there must be a timestamp field.

Great, @jefffohl ! I hoped you'd say that :) Then I hope this project has a lot of potential users - NAB, RiverView, all the Nupic examples with custom plot implementations, ...

I like your idea for normalizing fields. The "fields" UI might start getting a bit crowded at this point, so we would probably want to allow the user to toggle this portion of the UI in and out of sight.

That's true. You'd want to set the scaling up once and then continue browsing the graph. We could also already preselect some values (reference=bestPredictions.actual; scale=anomalyScore; ..if these fields are found. But not sure if this is not an overkill?)

Regarding the zoom, feature, have you taken a look at the range-selector widget? http://dygraphs.com/gallery/#g/range-selector

I haven't, and it looks pretty good and combines the functionality of un/zoom with a global outlook! :+1:

...do you have any ideas about how to render field names that are very long, such as this? multiStepBestPredictions:multiStep:errorMetric='aae':steps=5:window=1000:field=consumption Can you think of a more concise label that still has meaning for the user? If we can come up with some kind of logic, I can convert them.

I think we can't compress it (much), if we go for some "encoding", it will become over complicated. We could: A) drop the part multiStepBestPredictions:multiStep which is same for all metrics in our OPF model (but does not solve the general problem of long names) B) can we use <span> tag on mouseover to get the full text and leave the normal text with overflow: hidden ? (imho preferred method) C) use some JS hackery, get the label's size, if too long, create a placeholder "Label < N>" and inject a paragraph

<p>
Graph labels:
* label <N> = "blah blah"
jefffohl commented 8 years ago

@breznak - a new pull request for review. I made a lot of changes:

jefffohl commented 8 years ago

A screen shot of what it should look like now: screen shot 2015-10-29 at 9 56 07 am

jefffohl commented 8 years ago

I will be looking into the issue of DyGraph not working in Firefox.

rhyolight commented 8 years ago

Maybe it's not the right time for feedback... but I really like having a smaller, synced graph to display the anomaly score and likelihood, if there is one.


Matt Taylor OS Community Flag-Bearer Numenta

On Thu, Oct 29, 2015 at 10:08 AM, Jeff Fohl notifications@github.com wrote:

I will be looking into the issue of DyGraph not working in Firefox.

— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-152251516.

jefffohl commented 8 years ago

@rhyolight - do you mean a smaller graph below the main graph? Can't the same be achieved with what we have now, where the user can turn on and off whichever fields they wish to view? Perhaps you can expand on your idea a bit more?

rhyolight commented 8 years ago

I should play with it first before you do anything. Will do that now.


Matt Taylor OS Community Flag-Bearer Numenta

On Thu, Oct 29, 2015 at 11:53 AM, Jeff Fohl notifications@github.com wrote:

@rhyolight https://github.com/rhyolight - do you mean a smaller graph below the main graph? Can't the same be achieved with what we have now, where the user can turn on and off whichever fields they wish to view? Perhaps you can expand on your idea a bit more?

— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-152285918.

jefffohl commented 8 years ago

@breznak I added a fix for the Firefox issue to the current outstanding PR.

rhyolight commented 8 years ago

I am looking into how to use the BasicPredictionMetricsLogger with an OPF model outside the experiment_runner framework. I would ideally like to be able to plot the results of any model I create with ModelFactory.createModel().

jefffohl commented 8 years ago

@rhyolight - Not being an expert with the OPF, can you explain to me what your comment implies? It looks like the BasicPredictionMetricsLogger outputs JSON. Are you saying that you would like the app to be able to consume any data format (and possible automatically figure out the format type)?

rhyolight commented 8 years ago

No, am just trying to figure out how to get the data file you are plotting out of the OPF. I don't use the OPF experiment framework for anything (which is what created that file), I always just create a model and run it.

breznak commented 8 years ago

..I always just create a model and run it.

@rhyolight just write the outputs to a CSV file and you are good (?)

rhyolight commented 8 years ago

@breznak I don't see how I can do this in the OPF API.

breznak commented 8 years ago

Merged Jeff's latest PR and updated the issue description, the check-list is getting slimmer :wink: What do you think Jeff, are there any issues that should be ironed out, or can we head for a public PR to nupic soon? :fireworks:

breznak commented 8 years ago

I don't see how I can do this in the OPF API.

@rhyolight we're working towards reasonably generic CSV support (I'll have some testcase and new (sub)issues soonish), so basically anything with a timestamp field would work. The "main" focus on OPF files assumes the files produced by scripts/run_opf_experiment so you could look there how to write in a similar format (?) Otherwise it might be better to discuss under some specific code you're working on ?

jefffohl commented 8 years ago

@breznak I think this looks good, in terms of a big improvement over what is there now. We will need to write some tests before we can submit a PR, I believe, and this is something that we will need to work with @rhyolight on. I am not sure if there are any Javascript testing frameworks used inside of NuPIC, and whether or not we want to introduce them. Also, if we are going to do more work on this, I would like at some point to introduce some workflow tools, for building the app, such as gulp and SaSS, etc. I am not sure how Matt feels about having all of that stuff in NuPIC.

rhyolight commented 8 years ago

generic CSV is perfect for me.


Matt Taylor OS Community Flag-Bearer Numenta

On Tue, Nov 3, 2015 at 2:53 PM, Jeff Fohl notifications@github.com wrote:

@breznak https://github.com/breznak I think this looks good, in terms of a big improvement over what is there now. We will need to write some tests before we can submit a PR, I believe, and this is something that we will need to work with @rhyolight https://github.com/rhyolight on. I am not sure if there are any Javascript testing frameworks used inside of NuPIC, and whether or not we want to introduce them. Also, if we are going to do more work on this, I would like at some point to introduce some workflow tools, for building the app, such as gulp and SaSS, etc. I am not sure how Matt feels about having all of that stuff in NuPIC.

— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-153514619.

jefffohl commented 8 years ago

@breznak one thing that I mentioned, but which bears repeating, is that currently, the script will strip out the second and third rows of the CSV. So, if the file is not similar to the OPF output, the first two lines of data will be missing.

breznak commented 8 years ago

one thing that I mentioned, but which bears repeating, is that currently, the script will strip out the second and third rows of the CSV. So, if the file is not similar to the OPF output, the first two lines of data will be missing.

@jefffohl I'm just writing a comment about related functionality, and I think dropping the first 2 lines is perfectly OK, as NuPIC is not trained there anyway. I'll put it in a readme, or a small


*) notice on the page?

jefffohl commented 8 years ago

@breznak - sounds fine for now - I just wanted to make sure you were aware.

breznak commented 8 years ago

Merged a small example data cleanup - moved to ./examples/{OPF,NAB,CSV} path.

generic CSV is perfect for me.

@rhyolight ...right on time! :hourglass:

@jefffohl I'd like to discuss my user-experience on the latest iteration, esp. focusing at the "generic CSV":

I've added a testing/example file examples/CSV/sin_30sec.csv that looks like this:

time,function,anomaly_score
0.0,0.0,1.0
0.05,0.30901699,1.0
0.1,0.58778525,1.0
0.15,0.80901699,1.0
0.2,0.95105652,1.0

1/ missing timestamp field I know we've stated a timestamp always occurs on an OPF file, but in cases when not (generic) we might...

2/ non-OPF format (3 row header) After workarounding 1/ by renaming to timestamp, I still couldn't draw the graph. The fix was to add the 2 more rows to be like an OPF header:

timestamp,function,anomaly_score
,,
,,
0.0,0.0,1.0
0.05,0.30901699,1.0
...

I'm not sure why as the code should just skip the 2nd and 3rd rows.

3/ non DateTime timestamp Another problem was my time (or axis X) was not in a date format but a sequence from 0.0 with 0.05 delta. The graph "ploted" but the range selector had troubles with the numbers, so the graph looks empty.

It would be nice if we could support this more generic "time" format. Especially as it would make it possible for the default 1..N if no timestamp is given. I'm not sure how this collides with the range selector (is it tied to datetime format?)

What is your opinion about these?

jefffohl commented 8 years ago

@breznak point by point:

jefffohl commented 8 years ago

@breznak - I figured out what was going wrong with your test file. The script is assuming that all timestamps are in string format, and it is choking when it encounters a number. It seems to me that it will be difficult to allow timestamps to be in any format the user wishes. Ideally, it would have to be in the ISO 8601 format, but we could define some limited list of valid formats, and warn the user about this requirement through the UI and the README file.

breznak commented 8 years ago

Hi @jefffohl !

if timestamp is numeric: 
 // keep timestamp column as is
eleif timestamp is Date:
 timestamp = ParseDate(timestamp) 
else: // no column named timestamp
 timestamp = range(1:1:size(data)) // create dummy default

...It seems it really is possible to do the dates/numbers for x-values, http://dygraphs.com/data.html#csv , now I probably don't understand the use for convertPapaToDygraph() function, which is dealing with dates and hardcodes the Date conversion? As DyGraph should handle the date format (is this what you meant by expecting the ISO dates?) Can we force ISO/numeric timestamp and let DyGraph decide to handle both seamlessly?

breznak commented 8 years ago

added a few "PR ready code" TODOs, working on a simple code "cleanup"

breznak commented 8 years ago

@jefffohl please sync with me now if you started working on something, I'm finishing some changes, so that we don't duplicate..

breznak commented 8 years ago

... and finished what I've meant in https://github.com/breznak/nupic/pull/13

breznak commented 8 years ago

Last thing I'd love to see is the "conditional highlighting" (2 TODOs in UI) as in http://dygraphs.com/gallery/#g/highlighted-weekends

jefffohl commented 8 years ago

@breznak I was mistaken that DyGraphs requires a Date object. It will accept a number or a Date object when the data is submitted as an array. We want to use an array, because that is easier to work with, from what is output from Papa Parse.

So, if the timestamp is of type Number, we can just pass that along without needing to create a Date object.

Note that the reason we are creating our Date objects the way that we are (as opposed to just passing a string to the Date constructor), is that Firefox is more strict about what formats it will accept. So, some user agents will be more accommodating than others. Therefore, we have to be precise about how construct the Date object.

Thanks for improving the opf_visualizer.js script. I am making a branch for myself also called plot_results_refactor and I will fix some bugs and make some more improvements.

breznak commented 8 years ago

Thanks @jefffohl ! Btw, have you reviewed breznak#13 and agree with it? If it doesn't collide with any of your unmerged code, I'd like to merge it to avoid future conflicts.

jefffohl commented 8 years ago

@breznak I haven't written anything since you created plot_results_refactor, so you can merge it if you like. There are some changes I would like to make if you don't mind, which I will make after you merge. For example, it occurs to me that we could make yet another change to how we handle the timestamp field:

breznak commented 8 years ago

Thanks, merged. And :+1: on all of your suggestions for timestamp

breznak commented 8 years ago

My unsuccessful try on zoom in/out https://github.com/breznak/nupic/pull/14