Open pgrew opened 7 years ago
From the PSVM wiki:
Step 0: Compile PSVM
On an XC, modify the Makefile for the following lines:
CC=CC # changed from mpicxx
# C-Compiler flags
CFLAGS=-O3 -Wall
# linker
LD=CC # changed from mpicxx
LFLAGS=-O3 -Wall
Load gnu PE:
> module load PrgEnv-gnu
.. and make it:
> make
Note: This compiles for MPI. I am not sure how to compile for serial execution.
Step 0.5: Download some test data
> wget -P data/ http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/splice.t
> wget -P data/ http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/splice
Step 1: Run PSVM
# Train
> mpiexec -n 4 ./svm_train -rank_ratio 0.1 -kernel_type 2 -hyper_parm 1 -gamma 0.01 \
-model_path $HOME/psvm/model data/splice
# Predict
> mpiexec -n 4 ./svm_predict -model_path $HOME/psvm/model data/splice.t
You can add these into the issue description if you'd like.
I am unable to find the Image dataset., and the RCV dataset has more than two classes, so I don't which two classes the authors used (PSVM only handles binary classification). For now, I will proceed with reproducing the performance of only the CoverType dataset.
I am unable to find the Image dataset., and the RCV dataset has more than two classes, so I don't which two classes the authors used (PSVM only handles binary classification). For now, I will proceed with reproducing the performance of only the CoverType dataset.
We could always contact the author...
I think that is the right idea. I found this link http://groups.google.com/group/psvm?lnk=srg but I don't have my google credentials with me at this moment.
Looking again, it looks like all of the datasets used in the paper are available here, except for Image.
Q: Should I check these into datasets
?
A: I think I'll check in the smaller datasets, and include a script that will wget
and unpack the larger datasets.
and the RCV dataset has more than two classes,
The source says:
# of classes: 2
What leads you to believe there are more than 2 classes? The ID
fields in the data file?
What leads you to believe there are more than 2 classes? The ID fields in the data file?
I was working with a different RCV dataset. I will use your link. The number of training/testing samples are slightly off from table 1 in the paper, but it is likely the correct dataset.
For completeness sake, here is the RCV dataset I was previously looking at: https://archive.ics.uci.edu/ml/datasets/Reuters+RCV1+RCV2+Multilingual,+Multiview+Text+Categorization+Test+collection
Use PSVM to classify a ML data set. PSVM only supports binary classification, so the iris data set should either be modified or another data set should be chosen.