Closed breznak closed 8 years ago
Right now, the little web app I am working on just reads files that have already been created by the OPF. But, I was imagining that the next step would be to visualize streaming live data.
That's a good thing to think about, but it will require a server to post data to. The live data coming out of NuPIC will be from a python runtime, so it will need to be posted somewhere for display. Just something to think about.
I was imagining that the next step would be to visualize streaming live data.
Same. I was planning this for later. Actually if the plotting is reasonably fast, you could just reread the file every second, to get a semi-live data plot.
...so it will need to be posted somewhere for display.
In some other issue I was thinking about network I/O region (maybe somebody already implemented that? I remember some networking code(?)) The OPF model could just add a network-out region and the plotter would bind to a known address.... could work like that? (but not there yet)
I created a new PR. Notes here: https://github.com/breznak/nupic/pull/10
Is it better to include the notes here, or with the PR?
I created a new PR. Notes here: breznak#10
@jefffohl This is a HUGE progress, KUDOs! :+1: :100: This is everything what I've originally imagined for the visualizer.
I've merged your work, updated the check-points and tested, all works well here.
I am not sure if this approach is optimal or not. It might be better to hard-code certain fields to be included. Or perhaps this way is best. Would like feedback on that. In any case, consider this a work in progress, with this being the latest installment as we work towards a more polished app.
I would be more for the dynamic approach with plotting all numeric fields, as we don't know what the user might be interested in (for example, I found quite interesting plotting all the metrics and comparing their performance). A compromise might be preselecting some default check-boxes (anomalyScore, bestPredictions.actual)?
@jefffohl I've added some usability comments to the "Bugs" section, what do you think about them?
Thanks @breznak. I will take a crack at the bugs today. As you are more familiar with the OPF than I am, can you give me a list of what fields will always be found in an OPF file, as well as fields that may or may not be found there? I would like to auto-detect the file type, to try and determine if it is an OPF file, and if it is a valid OPF file.
@breznak - also, what fields besides anomalyScore do we imagine a user would like to scale proportionally to the data range?
can you give me a list of what fields will always be found in an OPF file, as well as fields that may or may not be found there? I would like to auto-detect the file type, to try and determine if it is an OPF file, and if it is a valid OPF file.
@jefffohl this is little tricky. An OPF file will have columns by the Model
used, if you take "anomaly detection" as the intended use-case for the visualizations, the model has to be TemporalAnomaly
(which inherits from TemporalMultiStep
) and always has an anomalyScore
field. (+ the field(s) from TemporalMultiStep).
A TemporalMultiStep
will always have the multiStepBestPredictions.actual
field, and depending on user's settings additional multiStepBestPredictions.<N>
fields.
There is also always the timestamp
, but it may not have to be in a date format always, can appear as an iteration (1,2,3,4,...).
Other models are not actively used AFAIK.
what fields besides anomalyScore do we imagine a user would like to scale proportionally to the data range?
Do you intend this to be OPF-only, or a generic "I will plot what I can from the CSV" implementation?
If the latter, one of the use-cases I had on mind was comparing several anomaly implementations/settings. This happens in NAB results file (it could be plotted if we overcome the hard-coded names somehow? *1) ) which includes the raw_anomaly
and anomaly_score
(=likelihood) fields.
timestamp,value,anomaly_score,raw_score,label,S(t)_reward_low_FP_rate,S(t)_reward_low_FN_rate,S(t)_standard
2014-07-01 00:00:00,10844.0,0.0301029996659,1.0,0,0.0,0.0,0.0
2014-07-01 00:30:00,8127.0,0.0301029996659,1.0,0,0.0,0.0,0.0
2014-07-01 01:00:00,6210.0,0.0301029996659,1.0,0,0.0,0.0,0.0
2014-07-01 01:30:00,4656.0,0.0301029996659,1.0,0,0.0,0.0,0.0
*1) 2 ideas to approaching the "fixed names" problem: A) "in code":
NORMALIZE_TO_FIELD="multiStepBestPredictions.actual"
SCALE_FIELDS_ARR=["anomalyScore"] # can extend here
B) "in web GUI":
Btw, I've found a working example for the finer zoom in/out http://dygraphs.com/gallery/#g/interaction ("Custom interaction" block) but I'm still unable to transfer it here :/
Thanks @breznak.
I do want to make the app as flexible as possible, and therefore would like to enable it to consume whatever data is passed its way, with the only constraint being that there must be a timestamp
field.
I like your idea for normalizing fields. The "fields" UI might start getting a bit crowded at this point, so we would probably want to allow the user to toggle this portion of the UI in and out of sight.
Regarding the zoom, feature, have you taken a look at the range-selector widget? http://dygraphs.com/gallery/#g/range-selector
@breznak do you have any ideas about how to render field names that are very long, such as this?
multiStepBestPredictions:multiStep:errorMetric='aae':steps=5:window=1000:field=consumption
Can you think of a more concise label that still has meaning for the user? If we can come up with some kind of logic, I can convert them.
@breznak @jefffohl I would like to test this out. How do I do it?
@rhyolight
cd nupic/scripts/visualization
python -m SimpleHTTPServer 8080
http://localhost:8080
Oops - forgot this is on the main branch. Here is @breznak's branch: https://github.com/breznak/nupic/tree/plot_results
PS - you will want to test with an OPF output file, like the one found at nupic/scripts/visualization/opf_results/
I do want to make the app as flexible as possible, and therefore would like to enable it to consume whatever data is passed its way, with the only constraint being that there must be a timestamp field.
Great, @jefffohl ! I hoped you'd say that :) Then I hope this project has a lot of potential users - NAB, RiverView, all the Nupic examples with custom plot implementations, ...
I like your idea for normalizing fields. The "fields" UI might start getting a bit crowded at this point, so we would probably want to allow the user to toggle this portion of the UI in and out of sight.
That's true. You'd want to set the scaling up once and then continue browsing the graph. We could also already preselect some values (reference=bestPredictions.actual; scale=anomalyScore; ..if these fields are found. But not sure if this is not an overkill?)
Regarding the zoom, feature, have you taken a look at the range-selector widget? http://dygraphs.com/gallery/#g/range-selector
I haven't, and it looks pretty good and combines the functionality of un/zoom with a global outlook! :+1:
...do you have any ideas about how to render field names that are very long, such as this?
multiStepBestPredictions:multiStep:errorMetric='aae':steps=5:window=1000:field=consumption
Can you think of a more concise label that still has meaning for the user? If we can come up with some kind of logic, I can convert them.
I think we can't compress it (much), if we go for some "encoding", it will become over complicated.
We could:
A) drop the part multiStepBestPredictions:multiStep
which is same for all metrics in our OPF model (but does not solve the general problem of long names)
B) can we use <span>
tag on mouseover to get the full text and leave the normal text with overflow: hidden ? (imho preferred method)
C) use some JS hackery, get the label's size, if too long, create a placeholder "Label < N>" and inject a paragraph
<p>
Graph labels:
* label <N> = "blah blah"
@breznak - a new pull request for review. I made a lot of changes:
A screen shot of what it should look like now:
I will be looking into the issue of DyGraph not working in Firefox.
Maybe it's not the right time for feedback... but I really like having a smaller, synced graph to display the anomaly score and likelihood, if there is one.
Matt Taylor OS Community Flag-Bearer Numenta
On Thu, Oct 29, 2015 at 10:08 AM, Jeff Fohl notifications@github.com wrote:
I will be looking into the issue of DyGraph not working in Firefox.
— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-152251516.
@rhyolight - do you mean a smaller graph below the main graph? Can't the same be achieved with what we have now, where the user can turn on and off whichever fields they wish to view? Perhaps you can expand on your idea a bit more?
I should play with it first before you do anything. Will do that now.
Matt Taylor OS Community Flag-Bearer Numenta
On Thu, Oct 29, 2015 at 11:53 AM, Jeff Fohl notifications@github.com wrote:
@rhyolight https://github.com/rhyolight - do you mean a smaller graph below the main graph? Can't the same be achieved with what we have now, where the user can turn on and off whichever fields they wish to view? Perhaps you can expand on your idea a bit more?
— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-152285918.
@breznak I added a fix for the Firefox issue to the current outstanding PR.
I am looking into how to use the BasicPredictionMetricsLogger
with an OPF model outside the experiment_runner
framework. I would ideally like to be able to plot the results of any model I create with ModelFactory.createModel()
.
@rhyolight - Not being an expert with the OPF, can you explain to me what your comment implies? It looks like the BasicPredictionMetricsLogger outputs JSON. Are you saying that you would like the app to be able to consume any data format (and possible automatically figure out the format type)?
No, am just trying to figure out how to get the data file you are plotting out of the OPF. I don't use the OPF experiment framework for anything (which is what created that file), I always just create a model and run it.
..I always just create a model and run it.
@rhyolight just write the outputs to a CSV file and you are good (?)
@breznak I don't see how I can do this in the OPF API.
Merged Jeff's latest PR and updated the issue description, the check-list is getting slimmer :wink: What do you think Jeff, are there any issues that should be ironed out, or can we head for a public PR to nupic soon? :fireworks:
I don't see how I can do this in the OPF API.
@rhyolight we're working towards reasonably generic CSV support (I'll have some testcase and new (sub)issues soonish), so basically anything with a timestamp
field would work. The "main" focus on OPF files assumes the files produced by scripts/run_opf_experiment
so you could look there how to write in a similar format (?) Otherwise it might be better to discuss under some specific code you're working on ?
@breznak I think this looks good, in terms of a big improvement over what is there now. We will need to write some tests before we can submit a PR, I believe, and this is something that we will need to work with @rhyolight on. I am not sure if there are any Javascript testing frameworks used inside of NuPIC, and whether or not we want to introduce them. Also, if we are going to do more work on this, I would like at some point to introduce some workflow tools, for building the app, such as gulp and SaSS, etc. I am not sure how Matt feels about having all of that stuff in NuPIC.
generic CSV is perfect for me.
Matt Taylor OS Community Flag-Bearer Numenta
On Tue, Nov 3, 2015 at 2:53 PM, Jeff Fohl notifications@github.com wrote:
@breznak https://github.com/breznak I think this looks good, in terms of a big improvement over what is there now. We will need to write some tests before we can submit a PR, I believe, and this is something that we will need to work with @rhyolight https://github.com/rhyolight on. I am not sure if there are any Javascript testing frameworks used inside of NuPIC, and whether or not we want to introduce them. Also, if we are going to do more work on this, I would like at some point to introduce some workflow tools, for building the app, such as gulp and SaSS, etc. I am not sure how Matt feels about having all of that stuff in NuPIC.
— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/2658#issuecomment-153514619.
@breznak one thing that I mentioned, but which bears repeating, is that currently, the script will strip out the second and third rows of the CSV. So, if the file is not similar to the OPF output, the first two lines of data will be missing.
one thing that I mentioned, but which bears repeating, is that currently, the script will strip out the second and third rows of the CSV. So, if the file is not similar to the OPF output, the first two lines of data will be missing.
@jefffohl I'm just writing a comment about related functionality, and I think dropping the first 2 lines is perfectly OK, as NuPIC is not trained there anyway. I'll put it in a readme, or a small
*) notice on the page?
@breznak - sounds fine for now - I just wanted to make sure you were aware.
Merged a small example data cleanup - moved to ./examples/{OPF,NAB,CSV}
path.
generic CSV is perfect for me.
@rhyolight ...right on time! :hourglass:
@jefffohl I'd like to discuss my user-experience on the latest iteration, esp. focusing at the "generic CSV":
I've added a testing/example file examples/CSV/sin_30sec.csv
that looks like this:
time,function,anomaly_score
0.0,0.0,1.0
0.05,0.30901699,1.0
0.1,0.58778525,1.0
0.15,0.80901699,1.0
0.2,0.95105652,1.0
1/ missing timestamp
field
I know we've stated a timestamp
always occurs on an OPF file, but in cases when not (generic) we might...
time
-> timestamp
in the example)Timestamp
with radio-buttons, done the same way Data
is chosen for normalization. Probably the least disruptive change(?)
1..N
for the data items.2/ non-OPF format (3 row header)
After workarounding 1/ by renaming to timestamp
, I still couldn't draw the graph. The fix was to add the 2 more rows to be like an OPF header:
timestamp,function,anomaly_score
,,
,,
0.0,0.0,1.0
0.05,0.30901699,1.0
...
I'm not sure why as the code should just skip the 2nd and 3rd rows.
3/ non DateTime timestamp Another problem was my time (or axis X) was not in a date format but a sequence from 0.0 with 0.05 delta. The graph "ploted" but the range selector had troubles with the numbers, so the graph looks empty.
It would be nice if we could support this more generic "time" format. Especially as it would make it possible for the default 1..N
if no timestamp is given. I'm not sure how this collides with the range selector (is it tied to datetime format?)
What is your opinion about these?
@breznak point by point:
new Date(year, month[, day[, hour[, minutes[, seconds[, milliseconds]]]]]);
So, if we pass in a float, it won't really know what to do with that, as that would indicate a floating point year. This means we will have to write some logic that will handle a variety of submissions. I am not sure how to do this other than to write rules for specific types of timestamps (integers, floats, etc.). Alternatively, we could make it accept only ISO 8601 formatted strings (which would be the sane thing to do, but isn't compatible with the examples you have given me). Right now, I have some logic in there that will accept dates in the following formats:
@breznak - I figured out what was going wrong with your test file. The script is assuming that all timestamps are in string format, and it is choking when it encounters a number. It seems to me that it will be difficult to allow timestamps to be in any format the user wishes. Ideally, it would have to be in the ISO 8601 format, but we could define some limited list of valid formats, and warn the user about this requirement through the UI and the README file.
Hi @jefffohl !
timestamp
column - let's just enforce it then. generateFieldMap()
something like: if timestamp is numeric:
// keep timestamp column as is
eleif timestamp is Date:
timestamp = ParseDate(timestamp)
else: // no column named timestamp
timestamp = range(1:1:size(data)) // create dummy default
...It seems it really is possible to do the dates/numbers for x-values, http://dygraphs.com/data.html#csv , now I probably don't understand the use for convertPapaToDygraph()
function, which is dealing with dates and hardcodes the Date conversion? As DyGraph should handle the date format (is this what you meant by expecting the ISO dates?) Can we force ISO/numeric timestamp and let DyGraph decide to handle both seamlessly?
added a few "PR ready code" TODOs, working on a simple code "cleanup"
@jefffohl please sync with me now if you started working on something, I'm finishing some changes, so that we don't duplicate..
... and finished what I've meant in https://github.com/breznak/nupic/pull/13
Last thing I'd love to see is the "conditional highlighting" (2 TODOs in UI) as in http://dygraphs.com/gallery/#g/highlighted-weekends
@breznak I was mistaken that DyGraphs requires a Date object. It will accept a number or a Date object when the data is submitted as an array. We want to use an array, because that is easier to work with, from what is output from Papa Parse.
So, if the timestamp is of type Number, we can just pass that along without needing to create a Date object.
Note that the reason we are creating our Date objects the way that we are (as opposed to just passing a string to the Date constructor), is that Firefox is more strict about what formats it will accept. So, some user agents will be more accommodating than others. Therefore, we have to be precise about how construct the Date object.
Thanks for improving the opf_visualizer.js script. I am making a branch for myself also called plot_results_refactor
and I will fix some bugs and make some more improvements.
Thanks @jefffohl ! Btw, have you reviewed breznak#13 and agree with it? If it doesn't collide with any of your unmerged code, I'd like to merge it to avoid future conflicts.
@breznak I haven't written anything since you created plot_results_refactor
, so you can merge it if you like. There are some changes I would like to make if you don't mind, which I will make after you merge. For example, it occurs to me that we could make yet another change to how we handle the timestamp field:
number
we just pass it through directly - no need to convert to Date object.string
we try to get the browser to parse the date string. If the result is a valid Date object, we pass that along.string
and we can't get the browser to create a valid Date object (such as Firefox when dealing with the mm/dd/yy
format, then we will attempt to parse the string ourselves.Thanks, merged. And :+1: on all of your suggestions for timestamp
My unsuccessful try on zoom in/out https://github.com/breznak/nupic/pull/14
I'd like to visualize the results
.csv
from running OPF experiment. It should be relatively easy, but perhaps some of you have already written a nice utility for that? NAB, @rhyolight ?WORKING branch: https://github.com/breznak/nupic/tree/plot_results
TODO:
NUPIC/scripts/visualization/
plot.lypublishes all data online, I think this is unacceptable for general use for usopf_results/*.csv
data_file* results_file* data/ results/
DyGraph
https://github.com/numenta/NAB/blob/master/nab_visualizer.html#L4-L7 -- @jefffohl likedD3.js
for being more flexible, but decided to stick withDyGraph
so far as it fits the current needs of plotting graphs.The current JS code from NAB spends a lot of work in parsing a structure of files, in order to plot all *.csv files there. For OPF/Nupic we don't need that (although it wouldn't hurt to keep the functionality for NAB).
What we need is the ability to
plot a single file provided as an argument
(python plot.py ./opf_results/DefaultTask.csv
)NAB uses separate folders for paths & results, on OPF results file, the all these are together
actual
anomalyScore
multistepBestPrediction.1
(anomalyScore, >0.9) AND (annotation, >0.9), green # correctly detected anomaly
(anomalyScore, <=0.9) AND (annotation, >0.9), red # missed
(anomalyScore, >0.9) AND (annotation, <=0.9), yellow # false positive
timestamp
field would show up in the "fields div on the left", as interactive values are shown there.) see https://github.com/breznak/nupic/pull/20If the above succeeds, maybe NAB should switch to using OPF format for the output (data+results) ( @subutai ?)
anomalyScore
has a string type in OPF (maybe bcs it's None at the 1st step)anomalyAnnotations
OPF field, to mark human annotations, could be useful in NABscalarValue
member, can we expose that (add a fieldXX.scalar
for each input field to OPF file) somehow?currently some fields are hard-coded (eg for "scaled anomaly score") and the rendering fails if the fields are not present. But a nice feature is the dynamic menu for plotting numeric fields, it would be nice if only the "scaled" function failed (and its checkbox is greyed-out) instead of all plot failing.
would require: a) the model sends the updates to the server; b) a network in/out region; c) refreshing the plot of the (updated) file every second or so..
UPDATES: