The published copy of this package differs from the local version in a few ways.

  1. As discovered in #1, is that they didn't include the 32-bit Windows files.
  2. They removed all directory structure from the package, presumably because as a matter of policy the SSC and Stata Journal archives don't like to allow infinite depth; so instead of bin/{LINUX,LINUX64,MACINTEL64,WIN,WIN64A/}/*.plugin there's, respectively (but skipping the missing "WIN" files), *.for{linux,lux64,macintel64,win64a}
  3. They (or possibly @schonlau ) updated svm_examples.ihlp and svmachines.sthlp.

I know this last because I compared my SJ-installed copy with the repo via

that the .plugin files differ isn't a surprise: gcc changes constantly. So the only real changes are the help files.

Those changes themselves can be seen in

--- statasvm/src/svm_examples.ihlp  2016-11-09 23:06:28.000000000 -0500
+++ /home/kousu/ado/plus/s/svm_examples.ihlp    2018-05-13 08:27:25.542174011 -0400
@@ -1,84 +1,91 @@
 {* This file was generated by scripts/examples2smcl.}{...}
 {* It is included by the svmachines.sthlp file to embed the examples/ folder into the documentation.}{...}
-{title:Examples: binary classification}
+    {title:Binary classification}

 {phang2}{cmd:. sysuse auto}{p_end}

-{pstd}Machine learning methods like SVM are very easy to overfit.{p_end}
-{pstd}To compensate, it is important to split data into training and testing sets, fit on{p_end}
-{pstd}the former and measure performance on the latter, so that performance measurements{p_end}
-{pstd}are not artificially inflated by data they've already seen.{p_end}
-{pstd}But after splitting the proportion of classes can become unbalanced.{p_end}
-{pstd}The reliable way to handle this is a stratified split, a split that{p_end}
-{pstd}fixes the proportions of each class in each partition of each class.{p_end}
-{pstd}The quick and dirty way is a shuffle:{p_end}
+Machine-learning methods like SVM are very easy to overfit.  To compensate,
+you must split data into training and testing sets, fit on the former,
+and measure performance on the latter, so that performance measurements are
+not artificially inflated by data they have already seen.{p_end}
+After splitting, the proportion of classes can become unbalanced.  The
+reliable way to handle this is a stratified split, which fixes the
+proportions of each class in each partition of each class.  The
+quick-and-dirty way is a shuffle,{p_end}
 {phang2}{cmd:. set seed 9876}{p_end}
-{phang2}{cmd:. gen u = uniform()}{p_end}
+{phang2}{cmd:. generate u = runiform()}{p_end}
 {phang2}{cmd:. sort u}{p_end}

-{pstd}before the actual train/test split:{p_end}
+{pstd}before the actual train and test split:{p_end}
 {phang2}{cmd:. local split = floor(_N/2)}{p_end}
 {phang2}{cmd:. local train = "1/`=`split'-1'"}{p_end}
 {phang2}{cmd:. local test = "`split'/`=_N'"}{p_end}

-{pstd}Fit the classification model on the training set, with 'verbose' enabled.{p_end}
-{pstd}Training cannot handle missing data; here we elide it, but usually you should impute.{p_end}
-{phang2}{cmd:. svmachines foreign price-gear_ratio if !missing(rep78) in `train', v}{p_end}
-{pstd}Predict on the test set.{p_end}
-{pstd}Unlike training, predict can handle missing data: it simply predicts missing.{p_end}
+Fit the classification model on the training set, with {cmd:verbose} enabled.
+Training cannot handle missing data; here we omit it, but usually you should
+{phang2}{cmd:. svmachines foreign price-gear_ratio if !missing(rep78) in `train', verbose}{p_end}
+Predict on the test set.  Unlike training, {cmd:predict} can handle missing
+data it simply predicts missing.{p_end}
 {phang2}{cmd:. predict P in `test'}{p_end}

-{pstd}Compute error rate: the percentage of mispredictions is the mean of err.{p_end}
-{phang2}{cmd:. gen err = foreign != P in `test'}{p_end}
-{phang2}{cmd:. sum err in `test'}{p_end}
+Compute error rate: the percentage of mispredictions is the mean of {cmd:err}.{p_end}
+{phang2}{cmd:. generate err = foreign != P in `test'}{p_end}
+{phang2}{cmd:. summarize err in `test'}{p_end}

 {pstd}{it:({stata svmachines_example binary_classification:click to run})}{p_end}

-{title:Examples: multiclass classification}
+    {title:Multiclass classification}

 {phang2}{cmd:. use attitude_indicators}{p_end}

 {phang2}{cmd:. set seed 4532}{p_end}
-{phang2}{cmd:. gen u = uniform()}{p_end}
+{phang2}{cmd:. generate u = runiform()}{p_end}
 {phang2}{cmd:. sort u}{p_end}

-{pstd}Train/test split{p_end}
+{pstd}Train and test split{p_end}
 {phang2}{cmd:. local split = floor(_N*3/4)}{p_end}
 {phang2}{cmd:. local train = "1/`=`split'-1'"}{p_end}
 {phang2}{cmd:. local test = "`split'/`=_N'"}{p_end}

-{pstd}In general, you need to do grid-search to find good tuning parameters.{p_end}
-{pstd}These values of kernel, gamma, and coef0 just happened to be good enough.{p_end}
+In general, you need to do a grid search to find good tuning parameters.  These
+values of {cmd:kernel()}, {cmd:gamma()}, and {cmd:coef0()} just happened to be good enough.{p_end}
 {phang2}{cmd:. svmachines attitude q* in `train', kernel(poly) gamma(0.5) coef0(7)}{p_end}
 {phang2}{cmd:. predict P in `test'}{p_end}

 {pstd}Compute error rate.{p_end}
-{phang2}{cmd:. gen err = attitude != P in `test'}{p_end}
-{phang2}{cmd:. sum err in `test'}{p_end}
+{phang2}{cmd:. generate err = attitude != P in `test'}{p_end}
+{phang2}{cmd:. summarize err in `test'}{p_end}

-{pstd}An overly high percentage of SVs means overfitting{p_end}
-{phang2}{cmd:. di "Percentage that are support vectors: `=round(100*e(N_SV)/e(N),.3)'"}{p_end}
+An overly high percentage of SVs means overfitting{p_end}
+{phang2}{cmd:. display "Percentage that are support vectors: `=round(100*e(N_SV)/e(N),.3)'"}{p_end}

 {pstd}{it:({stata svmachines_example multiclass_classification:click to run})}{p_end}

-{title:Examples: class probability}
+    {title:Class probability}

 {phang2}{cmd:. use attitude_indicators}{p_end}

 {phang2}{cmd:. set seed 12998}{p_end}
-{phang2}{cmd:. gen u = uniform()}{p_end}
+{phang2}{cmd:. generate u = runiform()}{p_end}
 {phang2}{cmd:. sort u}{p_end}

-{pstd}Train/test split{p_end}
+{pstd}Train and test split{p_end}
 {phang2}{cmd:. local split = floor(_N*3/4)}{p_end}
 {phang2}{cmd:. local train = "1/`=`split'-1'"}{p_end}
 {phang2}{cmd:. local test = "`split'/`=_N'"}{p_end}
@@ -87,51 +94,57 @@
 {phang2}{cmd:. svmachines attitude q* in `train', kernel(poly) gamma(0.5) coef0(7) prob}{p_end}
 {phang2}{cmd:. predict P in `test', prob}{p_end}

-{pstd}the value in column P matches the column P_<attitude> with the highest probability{p_end}
+The value in column {cmd:P} matches the column {cmd:P_}{it:<attitude>} with the
+highest probability.{p_end}
 {phang2}{cmd:. list attitude P* in `test'}{p_end}

 {pstd}Compute error rate.{p_end}
-{phang2}{cmd:. gen err = attitude != P in `test'}{p_end}
-{phang2}{cmd:. sum err in `test'}{p_end}
+{phang2}{cmd:. generate err = attitude != P in `test'}{p_end}
+{phang2}{cmd:. summarize err in `test'}{p_end}

-{pstd}predict, prob is a *different algorithm* than predict, and can disagree about predictions.{p_end}
-{pstd}This disagreement will become absurd if combined with poor tuning.{p_end}
+Beware: {cmd:predict, probability} is a different algorithm than
+{cmd:predict} and can disagree about predictions.  This disagreement will
+become absurd if combined with poor tuning.{p_end}
 {phang2}{cmd:. predict P2 in `test'}{p_end}
-{phang2}{cmd:. gen agree = P == P2 in `test'}{p_end}
-{phang2}{cmd:. sum agree in `test'}{p_end}
+{phang2}{cmd:. generate agree = P == P2 in `test'}{p_end}
+{phang2}{cmd:. summarize agree in `test'}{p_end}

 {pstd}{it:({stata svmachines_example class_probability:click to run})}{p_end}

-{title:Examples: regression}
+    {title:Regression}

 {phang2}{cmd:. webuse highschool}{p_end}

 {phang2}{cmd:. set seed 793742}{p_end}
-{phang2}{cmd:. gen u = uniform()}{p_end}
+{phang2}{cmd:. generate u = runiform()}{p_end}
 {phang2}{cmd:. sort u}{p_end}

-{pstd}Train/test split{p_end}
+{pstd}Train and test split{p_end}
 {phang2}{cmd:. local split = floor(_N/2)}{p_end}
 {phang2}{cmd:. local train = "1/`=`split'-1'"}{p_end}
 {phang2}{cmd:. local test = "`split'/`=_N'"}{p_end}

-{pstd}Regression is invoked with type(svr) or type(nu_svr).{p_end}
-{pstd}Notice that you can expand factors (categorical predictors) into sets of{p_end}
-{pstd}indicator (boolean/dummy) columns with standard i. syntax, and you can{p_end}
-{pstd}record which observations were chosen as support vectors with sv().{p_end}
-{phang2}{cmd:. svmachines weight height i.race i.sex in `train', type(svr) sv(Is_SV)}{p_end}
+Regression is invoked with {cmd:type(svr)} or {cmd:type(nu_svr)}.  Notice that
+you can expand factors (categorical predictors) into sets of indicator
+(Boolean and dummy) columns with standard {cmd:i.} syntax, and you can record
+which observations were chosen as support vectors with {cmd:sv()}.{p_end}
+{cmd:. svmachines weight height i.race i.sex in `train', type(svr) sv(Is_SV)}{p_end}

-{pstd}Examine which observations were SVs. Ideally, a small number of SVs are enough.{p_end}
+Examine which observations were SVs. Ideally, a small number of SVs are enough.{p_end}
 {phang2}{cmd:. tab Is_SV in `train'}{p_end}

 {phang2}{cmd:. predict P in `test'}{p_end}

 {pstd}Compute residuals.{p_end}
-{phang2}{cmd:. gen res = (weight - P) in `test'}{p_end}
-{phang2}{cmd:. sum res}{p_end}
+{phang2}{cmd:. generate res = (weight - P) in `test'}{p_end}
+{phang2}{cmd:. summarize res}{p_end}

 {pstd}{it:({stata svmachines_example regression:click to run})}{p_end}

--- statasvm/src/svmachines.sthlp   2016-11-09 23:17:32.000000000 -0500
+++ /home/kousu/ado/plus/s/svmachines.sthlp 2018-05-13 08:27:25.542174011 -0400
@@ -1,364 +1,421 @@
 {* *! version 0.0.1  28may2015}{...}
-{vieweralsosee "[R] regress" "mansection R regress"}{...}
-{viewerjumpto "Syntax" "svmachines##syntax"}{...}
-{viewerjumpto "Description" "svmachines##description"}{...}
-{viewerjumpto "Installation" "svmachines##installation"}{...}
-{viewerjumpto "Options" "svmachines##options"}{...}
-{viewerjumpto "Stored results" "svmachines##results"}{...}
-{viewerjumpto "Remarks" "svmachines##remarks"}{...}
-{viewerjumpto "Examples" "svmachines##examples"}{...}
-{viewerjumpto "Copyright" "svmachines##copyright"}{...}
-{viewerjumpto "Authors" "svmachines##authors"}{...}
-{viewerjumpto "References" "svmachines##references"}{...}
-{...}{* NB: these hide the newlines }
+{cmd:help svmachines}{right: ({browse "http://www.stata-journal.com/article.html?article=st0461":SJ16-4: st0461})}

-{p2colset 5 18 20 2}{...}
-{p2col :{cmd:svmachines} {hline 2}}Support Vector Machines{p_end}
+{p2colset 5 19 21 2}{...}
+{p2col :{cmd:svmachines} {hline 2}}Support vector machines{p_end}

 {marker syntax}{...}

-{p 8 16 2}
-{help svmachines##svmachines:svmachines} {depvar} {indepvars} {ifin} [{cmd:,} {it:options}]
+{p 8 18 2}
+{cmd:svmachines} {depvar} {indepvars} {ifin} [{cmd:,} {it:options}]

 {p 8 16 2}
-{help svmachines##svm:svmachines} {indepvars} {ifin}, type({help svmachines##one_class:one_class}) [{it:options}]
+{cmd:svmachines} {indepvars} {ifin}{cmd:,} {cmdab:t:ype(}{helpb svmachines##one_class:one_class}{cmd:)} [{it:options}]

 {synoptset 20 tabbed}{...}
-{synopt :{opth t:ype(svmachines##type:type)}}Type of model to fit: {opt svc}, {opt nu_svc}, {opt svr}, or {opt nu_svr}, or {opt one_class}. Default: {cmd:type(svc)}{p_end}
-{synopt :{opth k:ernel(svmachines##kernel:kernel)}}SVM kernel function to use: {opt linear}, {opt poly}, {opt rbf}, {opt sigmoid}, or {opt precomputed}. Default: {cmd:kernel(rbf)}{p_end}
+{synopt :{opt t:ype(type)}}type of model to fit: {opt svc}, {opt nu_svc}, {opt svr}, or {opt nu_svr}, or {opt one_class}; default is {cmd:type(svc)}{p_end}
+{synopt :{opt k:ernel(kernel)}}SVM kernel function to use: {opt linear}, {opt poly}, {opt rbf}, {opt sigmoid}, or {opt precomputed}, default is {cmd:kernel(rbf)}{p_end}
+{* XXX the division between 'tuning' and 'model' parameters is hazy; for example, you could in theory cross-validate to choose degree (and people do this with neural networks), or even to choose the kernel . hmmmmm}{...}

-{* XXX the division between 'tuning' and 'model' parameters is hazy; e.g. you could in theory cross-validate to choose degree (and people do this with neural networks), or even to choose the kernel . hmmmmm}{...}
-{synopt :{opth c:(svmachines##c:#)}}For {opt svc}, {opt svr} and {opt nu_svr} SVMs, the weight on the margin of error. Should be > 0. Default: {cmd:c(1)}{p_end}
-{synopt :{opth eps:ilon(svmachines##epsilon:#)}}For {opt svr} SVMs, the margin of error allowed within which observations will be support vectors. Default: {cmd:eps(0.1)}{p_end}
-{synopt :{opth nu:(svmachines##nu:#)}}For {opt nu_svc}, {opt one_class}, and {opt nu_svr} SVMs, tunes the proportion of expected support vectors. Should be in (0, 1]. Default: {cmd:nu(0.5)}{p_end}
-{synopt :{opth g:amma(svmachines##gamma:#)}}For {opt poly}, {opt rbf} and {opt sigmoid} kernels, a scaling factor for the linear part of the kernel. Default: {cmd:gamma(1/[# {indepvars}])}{p_end}
-{synopt :{opth coef0:(svmachines##coef0:#)}}For {opt poly} and {opt sigmoid} kernels, a bias ("intercept") term for the linear part of the kernel. Default: {cmd:coef0(0)}{p_end}
-{synopt :{opth deg:ree(svmachines##degree:#)}}For {opt poly} kernels, the degree of the polynomial to use. Default: cubic ({cmd:degree(3)}){p_end}
-{synopt :{opt shrink:ing}}Whether to use {help svmachines##shrinking:shrinkage} heuristics to improve the fit. Default: disabled{p_end}
+{synopt :{opt c(#)}}for {cmd:type(svc)}, {cmd:type(svr)}, and {cmd:type(nu_svr)} SVMs, the weight on the margin of error; should be > 0; default is {cmd:c(1)}{p_end}
+{synopt :{opt eps:ilon(#)}}for {cmd:type(svr)} SVMs, the margin of error that determines which observations will be support vectors; default is {cmd:eps(0.1)}{p_end}
+{synopt :{opt nu(#)}}for {cmd:type(nu_svc)}, {cmd:type(one_class)}, and {cmd:type(nu_svr)} SVMs; tunes the proportion of expected support vectors; should be in (0, 1]; default is {cmd:nu(0.5)}{p_end}
+{synopt :{opt g:amma(#)}}for {cmd:kernel(poly)}, {cmd:kernel(rbf)}, and {cmd:kernel(sigmoid)}, a scaling factor for the linear part of the kernel; default is {cmd:gamma(}1/[{it:#} {indepvars}]{cmd:)}{p_end}
+{synopt :{opt coef0(#)}}for {cmd:kernel(poly)} and {cmd:kernel(sigmoid)}, a bias ("intercept") term for the linear part of the kernel; default is {cmd:coef0(0)}{p_end}
+{synopt :{opt deg:ree(#)}}for {cmd:kernel(poly)}, the degree of the polynomial to use; default is {cmd:degree(3)}{p_end}
+{synopt :{opt shrink:ing}}whether to use {help svmachines##shrinking:shrinkage} heuristics to improve the fit{p_end}

-{* {synopt :{opt norm:alize}}Whether to {help svmachines##normalize:center and scale} the data. NOT IMPLEMENTED. Default: disabled{p_end} }
-{synopt :{opt prob:ability}}Whether to {help svmachines##probability:precompute} for "predict, prob" during estimation. Only applicable to classification problems. Default: disabled{p_end}
-{synopt :{opth sv:(svmachines##sv:newvarname)}}If given, an indicator variable to generate to mark each row as a support vector or not. Default: disabled{p_end}
+{* {synopt :{opt norm:alize}}whether to {help svmachines##normalize:center and scale} the data. NOT IMPLEMENTED{p_end}}{...}
+{synopt :{opt prob:ability}}whether to {help svmachines##probability:precompute} for {cmd:predict, probability} during estimation; only applicable to classification problems{p_end}
+{synopt :{opt sv:(newvar)}}an indicator variable to generate to mark each row as a support vector or not{p_end}

-{synopt :{opth tol:erance(svmachines##tolerance:#)}}The stopping tolerance used to decide convergence. Default: {cmd:epsilon(0.001)}{p_end}
-{synopt :{opt v:erbose}}Turns on {help svmachines##verbose:verbose mode}. Default: disabled{p_end}
-{synopt :{opth cache:_size(svmachines##cache_size:#)}}The amount of RAM used to cache kernel values during fitting, in megabytes. Default: 100MB ({cmd:cache_size(100)}){p_end}
+{synopt :{opt tol:erance(#)}}stopping tolerance used to decide convergence; default is {cmd:epsilon(0.001)}{p_end}
+{synopt :{opt v:erbose}}turn on {help svmachines##verbose:verbose mode}{p_end}
+{synopt :{opt cache:_size(#)}}amount of RAM used to cache kernel values during fitting, in megabytes; default is {cmd:cache_size(100)}{p_end}
 {pstd}All variables must be numeric, including categorical variables.
-If you have categories stored in strings use {help encode} before {cmd:svmachines}.
+If you have categories stored in strings, use {helpb encode} before {cmd:svmachines}.
 INCLUDE help fvvarlist

+{title:Syntax for predict after svmachines}

 {p 8 16 2}
-{help svmachines##predict:predict} {newvar} {ifin} [{cmd:,} {it:options}]
+{cmd:predict} {newvar} {ifin} [{cmd:,} {it:options}]

-{synoptset 20 tabbed}{...}
+{synoptset 15}{...}
-{synopt :{opt prob:ability}}If specified, estimate class probabilities for each observation. The fit must have been previously made with {opt probability}.{p_end}
-{synopt :{opt scores}}If specified, output the scores, sometimes called decision values, that measure each observation's distance to its hyperplane. Incompatible with {opt probability}.{p_end}
-{synopt :{opt v:erbose}}Turns on {help svmachines##verbose:verbose mode}. Default: disabled{p_end}
+{synopt :{opt prob:ability}}estimate class probabilities for each observation; the fit must have been previously made with {opt probability}{p_end}
+{synopt :{opt scores}}output the scores, sometimes called decision values, that measure each observation's distance to its hyperplane; incompatible with {opt probability}{p_end}
+{synopt :{opt v:erbose}}turn on {help svmachines##verbose:verbose mode}{p_end}

 {marker description}{...}

-{cmd:svmachines} fits a support vector machine (SVM) model.
-SVM is not one, but several, variant models each based upon the principles of
-splitting hyperplanes and the culling of unimportant observations.
+{cmd:svmachines} fits a support vector machine (SVM) model.  SVM is not one
+but several variant models each based upon the principles of splitting
+hyperplanes and the culling of unimportant observations.

-The basic SVM idea is to find a linear boundary---a hyperplane---in high-dimensional space:
-for classification, this is a boundary between two classes;
-for regression it is a line {help svmachines##epsilon:near} which points should be--much like in {help regess:OLS},
-while simultaneously minimizing the number of observations required to distinguish
-this hyperplane.
-The unimportant observations are ignored after fitting is done, which makes SVM very memory efficient.
+The basic SVM idea is to find a linear boundary -- a hyperplane -- in
+high-dimensional space.  For classification, this is a boundary between two
+classes; for regression, it is a line {help svmachines##epsilon:near} which
+points should be -- much like in {help regess:ordinary least squares}, while
+simultaneously minimizing the number of observations required to distinguish
+this hyperplane.  The unimportant observations are ignored after fitting,
+which makes SVM very memory efficient.

-Each observation can be thought of as a vector,
-so the {it:support vectors} are those observations which the algorithm deems critical to the fit.
+Each observation can be thought of as a vector, so the support vectors are
+those observations which the algorithm deems critical to the fit.

-This package is a thin wrapper for the widely deployed {help svmachines##libsvm:libsvm}.
-The thinness of this wrapper is an intentional feature:
-it means work done under Stata-SVM should be replicable with other libsvm wrappers such as
-{browse "http://weka.wikispaces.com/LibSVM":Weka} or
-{browse "http://scikit-learn.org/stable/modules/svm.html":sklearn}.
-As a side-effect, some of the options are unfortunately terse.
+This package is a thin wrapper for the widely deployed {cmd:libsvm} 
+({help svmachines##libsvm:Chang and Lin 2011}).  The thinness of this wrapper
+is intentional.  It means work done under Stata SVM should be replicable with
+other {cmd:libsvm} wrappers such as 
+{browse "http://weka.wikispaces.com/LibSVM":{cmd:Weka}} or 
+{browse "http://scikit-learn.org/stable/modules/svm.html":{cmd:sklearn}}.  As
+a side effect, some of the options are unfortunately terse.

-See the {help svmachines##svmtutorial:libsvm SVM tutorial} for a gentle introduction to the method.
-If you find this manual confusing, refer to the authoritative
-the libsvm {browse "http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html":FAQ},
-{browse "https://github.com/cjlin1/libsvm/blob/master/README":README},
-and {help svmachines##libsvm:implementation paper}.
-Then please write us with your suggestions for clarification.
+See the {cmd:libsvm} SVM tutorial 
+({help svmachines##svmtutorial:Bennett and Campbell 2000}) for a gentle
+introduction to the method.  If you find this manual confusing, refer to the
+authoritative {cmd:libsvm} 
+{browse "http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html":FAQ}, 
+{browse "https://github.com/cjlin1/libsvm/blob/master/README":README}, and
+implementation article ({help svmachines##libsvm:Chang and Lin 2011}).  Then,
+please write us with your suggestions for clarification.

-Please also feel free to {help svmachines##authors:send us} any other feature requests.
+Please also feel free to {help svmachines##authors:send us} any other feature

 {marker installation}{...}

-Since this is just a wrapper, {bf:libsvm must be installed} to use this package.
-On Windows, libsvm.dll is bundled with the package,
-and you can find it in your {help adopath} (try {cmd:findfile libsvm.dll} to verify this).
-On OS X, libsvm is available in both {browse "https://brew.sh":brew} and {browse "https://www.macports.org":macports}.
-On Linux, search for libsvm in your distribution's package manager.
-You can also compile and install libsvm from source, 
-if you cannot find it in your package manager or if you want the latest libsvm.
-If you are having plugin load errors, please {help svmachines##authors:contact the authors},
-as we want to make the experience as smooth as possible for our users across as many platforms as possible.
+Because this is just a wrapper, {cmd:libsvm} must be installed to use this
+package.  On Windows, {cmd:libsvm.dll} is bundled with the package, and you
+can find it in your {helpb adopath} (try {cmd:findfile libsvm.dll} to verify
+this).  On OS X, {cmd:libsvm} is available in both 
+{browse "https://brew.sh":brew} and 
+{browse "https://www.macports.org":macports}.  On Linux, search for
+{cmd:libsvm} in your distribution's package manager.  You can also compile and
+install {cmd:libsvm} from source, if you cannot find it in your package
+manager or if you want the latest {cmd:libsvm}.  If you are having plugin load
+errors, please {help svmachines##authors:contact the authors}: we want to make
+the experience as smooth as possible for our users across as many platforms as

 {marker options}{...}
-{marker svmachines}{...}
-{dlgtab:svmachines}{* this is a misuse of dlgtab because I have no corresponding dialog, but it drastically helps readability }
-{cmd:svmachines} fits an SVM model, fitting {depvar} to {indepvars} except under {opt type(one_class)} which only uses {indepvars}.
+{title:Options for svmachines}

-libsvm has several algorithms with a single entry point. Since this is a thin wrapper, so do we,
-which means {it:not all combinations of options are valid}.
-Usually libsvm will give an error if you specify an invalid combination,
-but sometimes it just ignores parameters {it:without telling you}.
-Further, amongst valid combinations, not all options and datasets give good results.
-Our goal is to have sane defaults,
-so that the only choice you usually need to make is what {help svmachines##type:type} and {help svmachines##kernel:kernel} to use,
-but there is no way to give universal default parameters.
+{cmd:svmachines} fits an SVM model, fitting {depvar} to {indepvars} -- except
+under {cmd:type(one_class)}, which only uses {indepvars}.

-Rather than guessing at the {help svmachines##tuning_params:tuning parameters},
-you should almost always use cross-validated grid-search to find them.
-Which parameters you need to tune depend on which model you pick; for example,
-for ({opt type(svc)}, opt{kernel(rbf)}) you only need to find ({opt c()}, {opt gamma()}).
-You can grid-search on a subset of your full data, so long as it is a
-representative sample, to quickly find approximations for the optimal parameters.
-The {help svmachines##libsvmguide:libsvm guide} explains this in-depth.
+{cmd:libsvm} has several algorithms with a single entry point.  Because this
+package is a thin wrapper, {cmd:svmachines} also has several algorithms with a
+single entry point.  This means not all combinations of options are valid.
+Usually, {cmd:libsvm} will give an error if you specify an invalid
+combination, but sometimes, it ignores parameters without informing the user.
+Further, among valid combinations, not all options and datasets give good
+results.  Our goal is to have sane defaults, so the only choice you usually
+need to make is what {helpb svmachines##type:type()} and 
+{helpb svmachines##kernel:kernel()} to use.  However, there is no way to give
+universal default parameters.
+Rather than guessing at the 
+{help svmachines##tuning_params:tuning parameters}, you should almost always
+use cross-validated grid search to find them.  Which parameters you need to
+tune depends on which model you pick; for example, for ({cmd:type(svc)},
+{cmd:kernel(rbf)}), you only need to find ({cmd:c()}, {cmd:gamma()}).  You can
+grid search on a subset of your full data -- so long as it is a representative
+sample -- to quickly find approximations for the optimal parameters.  The
+{cmd:libsvm} guide ({help svmachines##libsvmguide:Hsu, Chang, and Lin 2003})
+explains this in-depth.

+{* MODEL PARAMS: }{...}
 {marker type}{...}
-{opt t:ype(type)} specifies which SVM model to run.{p_end}
-{pmore}{opt svc} and {opt nu_svc} perform classification.{p_end}
-{pmore2}{depvar} should be a variable containing categories.
-Multiclass classification is automatically handled if necessary using
-the {browse "http://en.wikipedia.org/wiki/Multiclass_classification":class-against-class} method.
+{opt type(type)} specifies which SVM model to run.  {it:type} is case
+{cmd:svc} and {cmd:nu_svc} perform classification.  {depvar} should be a
+variable containing categories.  Multiclass classification is automatically
+handled if necessary, using the
+{browse "http://en.wikipedia.org/wiki/Multiclass_classification":class-against-class}

-If you try to use floating point values with classification you
-will find that they are truncated mercilessly to their integer parts,
-so you may need to recode your categories before giving them to {cmd: svmachines}.
-If you end up with almost as many classes as observations,
-you have probably used a continuous {depvar} and
-should use regression instead.{p_end}
-{pmore}{opt svr} and {opt nu_svr} perform regression.{p_end}
-{pmore2}{depvar} should be a variable containing continuous values.{p_end}
-{pmore2}Rather than try to find a hyperplane which separates data as far as possible,
-this tries to find a hyperplane to which most data is as near as possible.
-See {help svmachines##svr_tutorial:the SVR tutorial} for more details.{p_end}
+If you try to use floating-point values with classification, you will find
+that they are truncated mercilessly to their integer parts, so you may need to
+recode your categories before giving them to {cmd:svmachines}.  If you end up
+with almost as many classes as observations, you have probably used a
+continuous {depvar} and should use regression instead.{p_end}
+{cmd:svr} and {cmd:nu_svr} perform regression.  {depvar} should be a variable
+containing continuous values.  Rather than trying to find a hyperplane that
+separates data as far as possible, these try to find a hyperplane to which
+most data are as near as possible.  See the support vector regression tutorial
+({help svmachines##svr_tutorial:Smola and Sch{c o:}lkopf 2004}) for more

 {marker one_class}{...}
-{pmore}{opt one_class} separates outliers from the bulk of the data.{p_end}
-{pmore2}{opt one_class} is a form of unsupervised learning.
-It estimates the support of a distribution by distinguishing "class" from "outlier",
-based only on the features given to it. Therefore, it does not take a {depvar}.
-Its predictions give 1 for "class" and -1 for "outlier".{p_end}
-{pmore3}{bf:Tip:} You may use the same {varlist} as with the other types.
- {opt one_class} just then interprets your {depvar} as one of its {indepvars},
- giving it more information to work with.{p_end}
-{pmore}To learn about the {help svmachines##nu:nu} variants, see {help svmachines##nusvm:Chen and Lin's ν-SVM tutorial}.{p_end}
+{cmd:one_class} separates outliers from the bulk of the data.  {cmd:one_class}
+is a form of unsupervised learning.  It estimates the support of a
+distribution by distinguishing "class" from "outlier", based only on the
+features given to it.  Therefore, it does not take a {depvar}.  Its predictions
+give 1 for "class" and -1 for "outlier".{p_end}

-{pmore}{it:type} is case insensitive.{p_end}
+Tip: You may use the same {varlist} as with the other types.  {cmd:one_class}
+just then interprets your {depvar} as one of its {indepvars}, giving it more
+information to work with.{p_end}

+To learn about the {help svmachines##nu:nu} variants, see 
+{help svmachines##nusvm:Chen, Lin, and Sch{c o:}lkopf's (2005)} nu SVM

 {marker kernel}{...}
-{opt k:ernel(kernel)} gives a kernel function to use.
+{opt kernel(kernel)} gives a kernel function to use.  {it:kernel} is case

-Much like {help glm:GLMs}, the {browse "https://en.wikipedia.org/wiki/Kernel_Method":kernel trick}
-extends the linear SVM algorithm to be capable of fitting nonlinear data.
-Kernels bend a non-linear space into a linear one by applying a high-dimensional mapping.
-Under high enough dimensions, any set of data looks close to linear.
-See {browse "https://www.youtube.com/watch?v=3liCbRZPrZA"} for a visualization of this process.
-The "trick"---and the reason why the set of kernels is hardcoded---is that
-for certain kernels the fit can be done efficiently without
-actually constructing the high-dimensional points, as the estimation only cares
-scoring coefficients u using the output of the kernel, not the output of the values the kernel
-is, in theory, operating upon.{p_end}
-{pmore}Kernels available in this implementation are:{p_end}
-{pmore2}{opt linear}: the dot-product you are probably familiar with from {help regress:OLS}: u'*v{p_end}
-{pmore2}{opt poly}: (gamma*u'*v + coef0)^degree. This extends the linear kernel with wiggliness.{p_end}
-{pmore2}{opt rbf}: stands for Radial Basis Functions, and treats the coefficients
-      as a mean to smoothly approach in a ball, with the form exp(-gamma*|u-v|^2);
-     this kernel tends to be a good generalist option for non-linear data.{p_end}
-{pmore2}{opt sigmoid}: a kernel which bends the linear kernel to fit in -1 to 1
-   with tanh(gamma*u'*v + coef0), similar to the {help logistic} non-linearity.{p_end}
-{pmore2}{opt precomputed}: assumes that {depvar} is actually a list of precomputed kernel values.
- With effort, you can use this to use custom kernels with your data.{p_end}
+Much like {helpb glm}, the 
+{browse "https://en.wikipedia.org/wiki/Kernel_Method":kernel trick} extends
+the linear SVM algorithm to be capable of fitting nonlinear data.  Kernels
+bend a nonlinear space into a linear one by applying high-dimensional mapping.
+Under high enough dimensions, any set of data looks close to linear.  See
+{browse "https://www.youtube.com/watch?v=3liCbRZPrZA"} to visualize this
+process.  The "trick" -- and the reason why the set of kernels is hardcoded --
+is that for certain kernels, the fit can be done efficiently without actually
+constructing the high-dimensional points, because the estimation only scores
+coefficients u using the output of the kernel, not the output of the values
+the kernel is, in theory, operating upon.{p_end}
+Kernels available in this implementation are the following:{p_end}
+{opt linear} is the dot product you are probably familiar with from 
+{help regress:ordinary least squares}, u'*v.{p_end}
+{opt poly} (gamma*u'*v + coef0)^degree extends the linear kernel with
+{opt rbf} stands for radial basis functions and treats the coefficients as a
+mean to smoothly approach in a ball, with the form exp(-gamma*|u-v|^2); this
+kernel tends to be a good generalist option for nonlinear data.{p_end}
+{opt sigmoid} is a kernel that bends the linear kernel to fit in -1 to 1 with
+tanh(gamma*u'*v + coef0), similar to {help logistic} nonlinearity.{p_end}
+{opt precomputed} assumes that {depvar} is actually a list of precomputed
+kernel values.  With effort, you can use this to use custom kernels with your
 {*  TODO: give a complete working example of using a custom kernel }{...}

-{pmore}{it:kernel} is case insensitive.{p_end}
+{* TUNING PARAMS: }{...}
 {marker tuning_params}{...}
 {marker c}{...}
-{opt c(#)} weights (regularizes) the error term used in {opt svc}, {opt svr} and {opt nu_svr}.
-Larger allows less error, but too large will lead to underfitting.
+{opt c(#)} weights (regularizes) the error term used in {cmd:type(svc)},
+{cmd:type(svr)}, and {cmd:type(nu_svr)}.  Larger numbers allow less error, but
+overlarge numbers will lead to underfitting.

 {marker epsilon}{...}
-{opt eps:ilon(#)} is the margin of error allowed by {opt svr}.
-Larger makes your fit more able to incorporate more observations, but can lead to underfitting.
-Smaller can lead to overfitting.
+{opt epsilon(#)} is the margin of error allowed by {cmd:type(svr)}.  Larger
+numbers make your fit more able to incorporate more observations but can lead
+to underfitting.  Smaller numbers can lead to overfitting.

 {marker nu}{...}
-{opt nu(#)} is used in the nu variants.
-The nu variants are a reparamaterization of regular SVM which lets you directly tune,
-using {opt nu}, the size of the svm margin, letting you control over- vs under-fitting.
-{opt nu} is simultaneously a bound on the fraction of training errors and the
-fraction of support vectors. Smaller {opt nu} means a smaller margin of error allowed -- so, a tigheter fit -- but more SVs required, and larger {opt nu} means a larger margin of error allowed and less SVs required.
-See {help svmachines##nusvm:the ν-SVM tutorial} for details.
+{opt nu(#)} is used in the nu variants.  The nu variants are a
+reparamaterization of regular SVM, which lets you directly tune the size of
+the {cmd:svmachines} margin using {cmd:nu()}.  This lets you control
+overfitting versus underfitting.  {cmd:nu()} is simultaneously bound on the
+fraction of training errors and the fraction of support vectors.  Smaller
+{cmd:nu()} means a smaller margin of error allowed -- so, a tighter fit -- but
+more SVs required, and larger {cmd:nu()} means a larger margin of error
+allowed and fewer SVs required.  See the nu SVM tutorial 
+({help svmachines##nusvm:Chen, Lin, and Sch{c o:}lkopf's 2005}) for details.
 {* ..wait... this doesn't make any sense. nu == 0.1 means there are at most 10% (training) errors and at least 10% are support vectors.}{...}
 {*      nu = 0.9 means there are at most 90% errors and at least 90% SVs.   you should always choose 0, then, to get perfect prediction and zero memory usage}{...}

 {marker gamma}{...}
-{opt g:amma(#)} is used in the non-linear {opt poly}, {opt rbf} and {opt sigmoid}
-kernels as a scaling factor for the linear part. Larger weights the data more.
+{opt gamma(#)} is used in the nonlinear {cmd:kernel(poly)}, {cmd:kernel(rbf)},
+and {cmd:kernel(sigmoid)} kernels as a scaling factor for the linear part.
+Larger numbers weigh the data more.

 {marker coef0}{...}
-{opt coef0(#)} similarly is used in the non-linear {opt poly} and {opt sigmoid}
-kernels as a pseudo-intercept term.
+{opt coef0(#)} similarly is used in the nonlinear {cmd:kernel(poly)} and
+{cmd:kernel(sigmoid)} kernels as a pseudointercept term.

 {marker degree}{...}
-{opt deg:ree(#)} selects the degree of the polynomial used by the {opt poly} kernel.
-This literally controls the degree of freedom in the {opt poly} fit:
-setting this too low results in underfitting and sometimes even non-convergence (notice that at {opt degree(1)}, this is just the {opt linear} kernel);
-setting this too high will result in overfitting.
+{opt degree(#)} selects the degree of the polynomial used by
+{cmd:kernel(poly)}.  This literally controls the degree of freedom in the
+{cmd:kernel(poly)} fit; setting this too low results in underfitting and
+sometimes even nonconvergence (notice that at {cmd:degree(1)}, this is just
+the {cmd:kernel(linear)} kernel).  Setting this too high will result in

 {marker shrinking}{...}
-{opt shrink:ing} invokes the shrinkage heuristics,
-which can sometimes improve the fit by trading bias for variance.
+{opt shrinking} invokes the shrinkage heuristics, which can sometimes improve
+the fit by trading bias for variance.

 {* FEATURE PARAMS: }{...}
 {* {marker normalize} }{...}
 {* {phang} }{...}
 {* {opt normalize} instructs the estimation to first center and scale the data }{...}
 {* as SVM tends to be very sensitive to scaling issues. }{...}
-{* This normalizes all data to [0,1] using min-max normalization, as suggested in the {help svmachines##libsvmguide:libsvm guide}. }{...}
-{* Normalization creates temporary variables, so you may prefer to preprocess the data yourself---destructively and in-place---to save time on re-estimations and memory for variables, }{...}
-{* especially if you are bumping up against your Stata system limits. You may find {cmd:ssc install center} helpful }{...}
+{* This normalizes all data to [0,1] using min-max normalization, as suggested in the {help svmachines##libsvmguide:{cmd:libsvm} guide}. }{...}
+{* Normalization creates temporary variables, so you may prefer to preprocess the data yourself -- destructively and in-place -- to save time on re-estimations and memory for variables, }{...}
+{* especially if you are bumping up against your Stata system limits.  You may find {cmd:ssc install center} helpful }{...}
 {marker probability}{...}
-{opt prob:ability} enables the use of "{help svmachines##predict_prob:predict, prob}".
-That does {browse "https://en.wikipedia.org/wiki/Platt_scaling":Platt scaling},
-so for each class-against-class this precomputes a logistic regression 
-which is tuned with 5-fold cross-validation.
-Internally, libsvm shuffles the data before cross-validation using the OS random number generator,
-which is unrelated to {help set seed:Stata's RNG}, so {it:different runs will give different results}.
-Enabling this demands a great deal of additional CPU and RAM.
+{opt probability} enables the use of 
+{helpb svmachines##predict_prob:predict, probability}.
+This does {browse "https://en.wikipedia.org/wiki/Platt_scaling":Platt scaling},
+which for each class against class precomputes a logistic regression tuned
+with fivefold cross-validation.  Internally, {cmd:libsvm} shuffles the data
+before cross-validation using the operating system random-number generator,
+which is unrelated to {help set seed:Stata's random-number generator}.
+Different runs will give different results.  Enabling this demands a great
+deal of additional CPU and RAM.

 {marker sv}{...}
-{opt sv(newvarname)} records in the given variable a boolean indicating whether each observation was determined to be a support vector.
+{opt sv(newvar)} records in the given variable a Boolean indicating whether
+each observation was determined to be a support vector.  On systems with an
+older {cmd:libsvm}, notably Ubuntu up through 16.04, this feature is not

 {marker tolerance}{...}
-{opt tol:erance(#)} is the stopping tolerance used by the numerical optimizer. You could widen this if you are finding convergence is slow,
- but be aware that this usually non-convergence is a deeper problem.
- You could also tighten this if you have a powerful enough machine and want to get slightly more accurate estimates.
+{opt tolerance(#)} is the stopping tolerance used by the numerical optimizer.
+You could widen this if you are finding convergence is slow, but be aware that
+nonconvergence is usually a deeper problem.  You could also tighten this
+if you have a powerful enough machine and want to get slightly more accurate
 {marker verbose}{...}
-{opt v:erbose} enables output from the low level libsvm code for the duration of the operation.
+{opt verbose} enables output from the low-level {cmd:libsvm} code for the
+duration of the operation.

 {marker cache_size}{...}
-{opt cache:_size(#)} controls a time-memory tradeoff during estimation.
-Value is how many megabytes (MB) of RAM to set aside for caching kernel values
-Generally, more is faster, at least until you run out of RAM or cause your machine to start swapping.
-On modern machines, a reasonable choice is {opt cache_size(1024)}.
+{opt cache_size(#)} controls a time-memory tradeoff during estimation.  Value
+is how many megabytes of RAM to set aside for caching kernel values.
+Generally, more is faster, at least until you run out of RAM or cause your
+machine to start swapping.  On modern machines, a reasonable choice is


 {marker predict}{...}
+{title:Options for predict}

-{pstd}After training you can ask svm to {cmd:predict} what the category (classification) or outcome value (regression)
-      should be for each given observation. Results are placed into {newvar}.{p_end}
-{pstd}{newvar} must not exist, so if you want to repredict your choices are {cmd:drop {newvar}} or to pick a new name, e.g. {cmd:predict {newvar}2}.{p_end}
+After training, you can ask {cmd:svmachines} to {cmd:predict} what the
+category (classification) or outcome value (regression) should be for each
+given observation.  Results are placed into {newvar}.  {it:newvar} must not
+exist, so if you want to repredict, your choices are {cmd:drop} {it:newvar} or
+pick a new name, for example, {cmd:predict} {it:newvar2}.{p_end}

 {marker predict_prob}{...}
-{phang}For classification ({opt svc}, {opt nu_svc}) problems, {opt probability} requests, for each observation, the probability of it being each class.
-{newvar} is used as a stem for the new columns.
-Both probabilities are computed with Platt Scaling.  When enabled, so are predictions, and this algorithm is not guaranteed to give the same results
-as otherwise. The results should be sensible either way, so if you are getting inconsistent results between the two algorithms,
-investigate the {help svmachines##tuning_params:tuning} parameters.
-This option is not valid for other SVM types.{p_end}
+{cmd:probability} requests (for classification ({cmd:type(svc)},
+{cmd:type(nu_svc)}) problems), for each observation, the probability of it
+being each class.  {newvar} is used as a stem for the new columns.  Both
+probabilities are computed with Platt scaling.  When enabled, so are
+predictions; this algorithm is not guaranteed to give the same results as
+otherwise.  The results should be sensible either way, so if you are getting
+inconsistent results between the two algorithms, investigate the 
+{help svmachines##tuning_params:tuning} parameters.  This option is not valid
+for other SVM types.{p_end}

 {marker scores}{...}
-{opt scores} outputs the values that {cmd:svmachines} uses to decide which side of the hyperplane a particular observation falls.
-{newvar} is used as a stem for the new columns.
-For {opt type(one_class)} and regressions, there is only one score.
-For classifications, there is one score for every pair of classes (this is expensive: k classes means k(k-1)/2 new columns!),
- because libsvm aggregates the basic binary-only svm algorithm into a multiclass algorithm with the one-against-one technique.
-This option is incompatible with {opt probability} because, once trained, the Platt Scaling algorithm does not directly compute scores.{p_end}
+{opt scores} outputs the values that {cmd:svmachines} uses to decide on which
+side of the hyperplane a particular observation falls.  {newvar} is used as a
+stem for the new columns.  For {cmd:type(one_class)} and regressions, there is
+only one score.  For classifications, there is one score for every pair of
+classes (this is expensive: k classes means k(k-1)/2 new columns!), because
+{cmd:libsvm} aggregates the basic binary-only {cmd:svmachines} algorithm into
+a multiclass algorithm with the one-against-one technique.  This option is
+incompatible with {opt probability} because, once trained, the Platt scaling
+algorithm does not directly compute scores.

-{opt verbose}: see {help svmachines##verbose:svmachines, verbose}.
+{opt verbose}; see {helpb svmachines##verbose:svmachines, verbose}.
+Prediction implicitly uses the same {indepvars} as during estimation, so be
+careful about renaming or dropping variables.

-Prediction implicitly uses the same {indepvars} as during estimation, so be careful about renaming or dropping variables.
+Memory limits: The cheaper versions of Stata allow fewer variables and smaller
+matrices to be used.  As machine-learning problems typically are on very large
+datasets, it is easy to inadvertently instruct this package to construct more
+columns or larger matrices than you can afford.  If you overflow 
+{helpb maxvar}, you will receive an error, the operation will fail, and the
+dataset will be left untouched.  If you overflow {helpb matsize}, the matrix
+that overflowed will be missing, but operation will otherwise succeed.
+If Stata's memory limits are an impossible hurdle, your best option is to give
+up on Stata and switch to {cmd:libsvm}'s companion {cmd:svm-train} program.
+This will have been installed with the {cmd:libsvm} package if you used a
+package manager, or you can get it {browse "http://www.csie.ntu.edu.tw/~cjlin/cgi-bin/libsvm.cgi?+http://www.csie.ntu.edu.tw/~cjlin/libsvm+zip":from its authors}; you can use {helpb svmlight:export_svmlight} to extract your
+dataset for use with {cmd:svm-train}.
+{marker examples}{...}
+INCLUDE help svm_examples

 {marker results}{...}
@@ -370,59 +427,37 @@
 {synoptset 20 tabbed}{...}
 {* Note: svm_model components left unexposed: }{...}
 {*      - probA, probB, the coefficients used for predict, prob; these are not, by themselves, interesting }{...}
-{*      - label, the "labels" of the classes (which are the integers libsvm casts out of the initial dataset; exported with strLabels to be used for labelling rho and sv_coef, but otherwise not directly interesting and  }{...}
+{*      - label, the "labels" of the classes (which are the integers {cmd:libsvm} casts out of the initial dataset; exported with strLabels to be used for labeling rho and sv_coef, but otherwise not directly interesting and}{...}
 {*      - nSV, the number of SVs per class; this is only interesting for classifications, and it duplicates what you can get out of "tab `e(depvar)' SV" }{...}
-{*      - free_sv, internal libsvm flag which is a hack to stretch svm_model to handle creation from both svm_train() and svm_import() }{...}
+{*      - free_sv, internal {cmd:libsvm} flag which is a hack to stretch svm_model to handle creation from both svm_train() and svm_import() }{...}
 {*      }{...}
 {*      - SV[] and sv_indices[] are exposed indirectly with the sv() option }{...}
 {p2col 5 20 24 2: Scalars}{p_end}
 {synopt:{cmd:e(N)}}number of observations{p_end}
-{synopt:{cmd:e(N_class)}}number of classes, in a classification problem. {opt 2} in a regression problem.{p_end}
-{synopt:{cmd:e(N_SV)}}number of support vectors.
-If {opt e(N_SV)}/{opt e(N)} is close to 100% your fit is inefficient; perhaps you need to adjust your {help svmachines##kernel:kernel}.
+{synopt:{cmd:e(N_class)}}number of classes, in a classification problem; {cmd:2} in a regression problem{p_end}
+{synopt:{cmd:e(N_SV)}}number of support vectors; if {cmd:e(N_SV)}/{cmd:e(N)} is close to 100%, your fit is inefficient; perhaps
+you need to adjust your {helpb svmachines##kernel:kernel()}{p_end}

 {synoptset 20 tabbed}{...}
 {p2col 5 20 24 2: Macros}{p_end}
 {synopt:{cmd:e(cmdline)}}command as typed{p_end}
 {synopt:{cmd:e(depvar)}}name of dependent variable{p_end}
 {synopt:{cmd:e(title)}}title in estimation output{p_end}
-{synopt:{cmd:e(svm_type)}}SVM type string, as above{p_end}
+{synopt:{cmd:e(svm_type)}}SVM-type string, as above{p_end}
 {synopt:{cmd:e(svm_kernel)}}kernel string, as above{p_end}
 {synopt:{cmd:e(predict)}}program used to implement {cmd:predict}{p_end}
-{synopt:{cmd:e(levels)}}list of the classes detected, in the order they were detected. Only defined for {opt type(svc)} and {opt type(nu_svc)}.{p_end}
+{synopt:{cmd:e(levels)}}list of the classes detected, in the order they were
+detected; only defined for {cmd:type(svc)} and {cmd:type(nu_svc)}{p_end}
 {* {synopt:{cmd:e(estat_cmd)}}program used to implement {cmd:estat}{p_end} }{...}

 {synoptset 20 tabbed}{...}
-{p2col 5 20 24 2: Matrices}({help svmachines##remarks:may be missing}){p_end}
-{synopt:{cmd:e(sv_coef)}}The coefficients of the support vectors for each fitted hyperplane in the {bf:dual} quadratic programming problem.{p_end}
+{p2col 5 20 24 2: Matrices}{p_end}
+{synopt:{cmd:e(sv_coef)}}coefficients of the support vectors for each fit hyperplane in the {bf:dual} quadratic programming problem{p_end}
 {* TODO: is there a clearer explanation of sv_coef? Is it worth including? }{...}
-{synopt:{cmd:e(rho)}}The intercept term for each fitted hyperplane. It is lower-triangular and {cmd:e(N_class)}^2 large, with each entry [i,j] representing the hyperplane between class i and class j.{p_end}
-{marker remarks}{...}
-{bf:Memory Limits}: The cheaper versions of Stata allow only allow less variables and smaller matrices to be used.
-As machine learning problems typically are on very large datasets,
-it is easy to inadvertently instruct this package to construct more columns or larger matrices than you can afford.
-If you overflow {help maxvar}, you will receive an error, the operation will fail, and the dataset will be left untouched.
-If you overflow {help matsize}, the matrix that overflowed will be missing, but operation will otherwise succeed.
-If Stata's memory limits are an impossible hurdle,
-your best option is to give up on Stata and switching to libsvm's companion {cmd:svm-train} program.
-This will have been installed with the libsvm package if you used a package manager, or
-you can get it {browse "http://www.csie.ntu.edu.tw/~cjlin/cgi-bin/libsvm.cgi?+http://www.csie.ntu.edu.tw/~cjlin/libsvm+zip":from its authors};
-You can use {help svmlight:export_svmlight} to extract your dataset for use with {cmd:svm-train}.
-{marker examples}{...}
+{synopt:{cmd:e(rho)}}intercept term for each fit hyperplane; lower triangular and {cmd:e(N_class)}^2 large, with each entry [i,j] representing the hyperplane between class i and class j{p_end}

-INCLUDE help svm_examples

 {marker copyright}{...}
@@ -452,7 +487,7 @@
@@ -460,108 +495,106 @@

-libsvm is licensed:
+{cmd:libsvm} is licensed:

-Copyright (c) 2000-2014 Chih-Chung Chang and Chih-Jen Lin
+Copyright (c) 2000-2014 Chih-Chung Chang and Chih-Jen Lin{break}
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions
-are met:
+modification, are permitted, provided that the following conditions are met:

-1. Redistributions of source code must retain the above copyright
-notice, this list of conditions and the following disclaimer.
+1. Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimer.

-2. Redistributions in binary form must reproduce the above copyright
-notice, this list of conditions and the following disclaimer in the
-documentation and/or other materials provided with the distribution.
+2. Redistributions in binary form must reproduce the above copyright notice,
+this list of conditions and the following disclaimer in the documentation
+and/or other materials provided with the distribution.

-3. Neither name of copyright holders nor the names of its contributors
-may be used to endorse or promote products derived from this software
-without specific prior written permission.
+3. Neither the names of copyright holders nor the names of its contributors
+may be used to endorse or promote products derived from this software without
+specific prior written permission.


-{marker authors}{...}
-Though the license does not obligate you in any way to do so, if you find
-this software useful we would be curious and appreciative to hear about your
-adventures in machine learning with Stata.{p_end}
-{pmore}Thank you.
-{pstd}You can contact us at{p_end}
-{pmore}* Nick Guenther <nguenthe@uwaterloo.ca>{p_end}
-{pmore}* Matthias Schonlau <schonlau@uwaterloo.ca>{p_end}
 {marker references}{...}

-{marker sourcecode}{...}
-Guenther, Nick and Schonlau, Matthias. 2015.
-{browse "https://git.uwaterloo.ca/schonlau/statasvm"}.
 {marker svmtutorial}{...}
-Bennett, Kristin P., and Colin Campbell. 2000.
-{it:Support Vector Machines: Hype or Hallelujah?}
-SIGKDD Explor. Newsl. 2.2: 1–13.
-{browse "http://www.svms.org/tutorials/BennettCampbell2000.pdf"}.
+Bennett, K. P., and C. Campbell. 2000.  Support vector machines: Hype or
+hallelujah?  {it:SIGKDD Explorations} 2(2): 1-13.

 {marker libsvm}{...}
-Chang, Chih-Chung and Lin, Chih-Jen. 2011.
-{it:LIBSVM: a library for support vector machines.}
-ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27.
-{browse "http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf"}.
-Software available at {browse "http://www.csie.ntu.edu.tw/~cjlin/libsvm"}
+Chang, C.-C., and C.-J. Lin. 2011.
+LIBSVM: A library for support vector machines.
+{it:ACM Transactions on Intelligent Systems and Technology} 2(3): Article 27.
+{marker nusvm}{...}
+Chen, P.-H., C.-J. Lin, and B. Sch{c o:}lkopf. 2005.
+A tutorial on nu-support vector machines.
+{it:Applied Stochastic Models in Business and Industry} 21: 111-136.
+{marker sourcecode}{...}
+Guenther, N., and M. Schonlau. 2015.
+{browse "https://git.uwaterloo.ca/nguenthe/statasvm"}.

 {marker libsvmguide}{...}
-Hsu, Chih-Wei, Chang, Chih-Chung, and Lin, Chih-Jen. April 15, 2010.
-{it:A Practical Guide to Support Vector Classification}.
-{browse "http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf"}.
+Hsu, C.-W., C.-C. Chang, and C.-J. Lin. 2003. A practical guide to support
+vector classification. {browse "http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf"}.

 {marker svr_tutorial}{...}
-Smola, Alex J., and Schölkopf, Bernhard. 2004.
-{it:A tutorial on support vector regression}.
-Statistics and Computing 14.3: 199–222.
-{* This one is behind a paywall, so the best we can do is a give a DOI link }{...}
-{browse "http://dx.doi.org/10.1023/b:stco.0000035301.49549.88"}.
+Smola, A. J., and B. Sch{c o:}lkopf. 2004.
+A tutorial on support vector regression.
+{it:Statistics and Computing} 14: 199-222.

-{marker nusvm}{...}
-Chen, Pai-Hsuen, Lin Chih-Jen, and Schölkopf, Bernhard. 2005.
-{it:A Tutorial on ν-Support Vector Machines}.
-Applied Stochastic Models in Business and Industry 21.2: 111–136.
-{browse "http://www.csie.ntu.edu.tw/~cjlin/papers/nusvmtutorial.pdf"}.

+{marker authors}{...}
+Though the license does not obligate you in any way to do so, if you find this
+software useful, we would be curious and appreciative to hear about your
+adventures in machine learning with Stata.  Thank you.
+You can contact us at
+{pstd}Nick Guenther{break}
+University of Waterloo{break}
+Waterloo, Canada{break}
+{pstd}Matthias Schonlau{break}
+University of Waterloo{break}
+Waterloo, Canada{break}
+{title:Also see}
+{p 4 14 2}Article:  {it:Stata Journal}, volume 16, number 4: {browse "http://www.stata-journal.com/article.html?article=st0461":st0461}{p_end}

+{p 7 14 2}
+Help:  {manhelp regress R}{p_end}
kousu commented 6 years ago

