terkkila / rf-ace

Automatically exported from code.google.com/p/rf-ace
0 stars 0 forks source link

Parsing ARFF files #49

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
Running the following command after compiling the source code from SVN
bin/rf-ace -F test_5by10_numeric_matrix.arff -i 4 -n 100 -m 5 -A 
associations.tsv

What is the expected output? What do you see instead?
The expected output is a run of the feature selection process. Instead, it is 
reported that the target is missing from all samples.
Verbatim:
-----------------------------------------------------------
|  RF-ACE version:  1.1.0, Dec 5th 2012                   |
|    Compile date:  Feb 18 2013, 01:07:51                 |
|   Report issues:  code.google.com/p/rf-ace/issues/list  |
-----------------------------------------------------------

Random Forest (RF) configuration:
 -n / --nTrees         = 100
 -m / --mTry           = 5
 -s / --nodeSize       = 3
 -a / --nMaxLeaves     = 2147483646
 -q / --quantiles      = NOT SET
 -N / --noNABranching  = NOT SET

Filter options:
 -p / --nPerms         = 20
 -t / --pValueTh       = 0.05

-Reading file 'test_5by10_numeric_matrix.arff' for filtering

Feature 'y' chosen as target with 10 / 0 samples ( -inf % missing ) among 5 
features
Not enough samples (0) to perform a single split

What version of the product are you using? On what operating system?
RF-ACE version as in verbatim output above.
Operating system is Ubuntu precise (12.04.2 LTS)

Please provide any additional information below.
Same behaviour is observed with all ARFF files.

Original issue reported on code.google.com by star...@gmail.com on 17 Feb 2013 at 7:46

GoogleCodeExporter commented 9 years ago
Resizing _sampleHeaders in readARFF function fixed the error for me.

Index: treedata.cpp
===================================================================
--- treedata.cpp    (revision 787)
+++ treedata.cpp    (working copy)
@@ -403,6 +403,7 @@
   }

   assert(sampleIdx = nLines);
+  sampleHeaders_.resize(sampleIdx,"NO_SAMPLE_ID");

 }

Original comment by star...@gmail.com on 18 Feb 2013 at 3:41

GoogleCodeExporter commented 9 years ago
Thanks for the report. I wrote some fixes to ARFF parsing, one of which is the 
one you proposed; indeed, I had forgotten to extend the sample headers once the 
data was read.

Original comment by timo.erk...@gmail.com on 10 Mar 2013 at 11:55