As a user, I don't want lines stripped out of my data if the CSV is not output from the OPF - Githubissues

htm-community / nupic.visualizations

Web application for interactive graphs, anomaly highlighting and online monitoring.

MIT License

17 stars 11 forks source link

As a user, I don't want lines stripped out of my data if the CSV is not output from the OPF #10

Closed jefffohl closed 8 years ago

jefffohl commented 9 years ago

Because the OPF includes two extra header lines in the CSV output, the app is currently stripping these out automatically. However, we don't want to strip these out when the file is a non-OPF file. So, we either need to auto-detect if the file has more than one header line, or we need to let the user determine how many lines to strip out.

rcrowder commented 9 years ago

@jefffohl In case you're not aware, the three links have correspondence between them. With types of columns appearing, etc. So could be auto-magically checked in any CSV file.

jefffohl commented 9 years ago

@rcrowder Yes, I was hoping I could come up with some algorithm for determining if lines 2 and 3 are header lines and not data. I just want to make sure that it doesn't return false positives. If you have ideas about how best to do this, let me know. Thanks!

rcrowder commented 9 years ago

@jefffohl I'll have a look further on the weekend. First (header) can/could be used to find the chosen user separator (comma, semicolon, tab, etc.), to check 2nd 3rd and 4th lines. With I think the second/third line having restricted type keyworkds allowed to be parsed in the OPF. Just getting into Windows porting of OPF, so hope to find out the type tokens, with all columns using string types being a potential blocking issue..

rcrowder commented 9 years ago

@jefffohl And typically they get ignored. https://github.com/numenta/nupic/blob/master/tests/integration/nupic/opf/opf_checkpoint_test/opf_checkpoint_test.py#L246 With the second and third lines being just the separator (from what I've seen so far).

rcrowder commented 9 years ago

@jefffohl May not need to look further? Field meta data seems to be defined here (a Numenta engineer could confirm this is correct?).. https://github.com/numenta/nupic/blob/master/src/nupic/data/fieldmeta.py#L118

rcrowder commented 9 years ago

With 'specials' defined further down

breznak commented 9 years ago

@jefffohl Thanks for bringing over the open issues! and Richard for finding the definitions. Just a note, typically you'll have a short form float,,,,float (where the missing defaults to string - not sure). Another problem we found is that some fields (anomalyScore at least) are meant to be float, but in OPF file use string type (as the first few values can be 'None', this should change and use eg -1 instead)-that is a NuPIC problem.