Closed dashaub closed 7 years ago
I vote using c(2:20)
and going up to 4. It's called "deepnet" after all!
@zachmayer I'd like to see that too, although it's a slightly more disruptive change. We should probably do some performance tests to see how much slower these networks will fit.
@zachmayer
After putting together a simple benchmark, sae.dnn.train
is throwing errors when the network depth == 4.
library(microbenchmark)
xdat <- as.matrix(dat[names(dat) != "type"])
ydat <- ifelse(dat$type == "spam", 1L, 0L)
epochs <- 10
set.seed(3)
microbenchmark(sae.dnn.train(x = xdat, y = ydat, hidden = c(2, 2, 2), numepochs = epochs),
sae.dnn.train(x = xdat, y = ydat, hidden = c(2, 2, 2, 2), numepochs = epochs),
sae.dnn.train(x = xdat, y = ydat, hidden = c(5, 5, 5), numepochs = epochs),
sae.dnn.train(x = xdat, y = ydat, hidden = c(5, 5, 5, 5), numepochs = epochs),
sae.dnn.train(x = xdat, y = ydat, hidden = c(20, 20, 20), numepochs = epochs),
sae.dnn.train(x = xdat, y = ydat, hidden = c(20, 20, 20, 20), numepochs = epochs),
times = 5)
I've merged in the PR.
@dashaub You should contact the package maintainer about that
The
dnn
models from the "deepnet" package will not produce sensible models when any of the layers are of size 1. This results in many wasted training rounds going towards models that can't produce predictions. Here is a reproducible example:This produces
NaN
accuracy metrics when at least one layer has 0 or 1 neurons:Obligatory sessionInfo()
This can be fixed by changing the allowable neuron sizes for each layer to be >=2 as in PR #458. While 0 is an acceptable depth so long as at least one other layer is >= 2, this would be a more complicated sampling implementation to avoid this possibility entirely, so I'm not sure supporting networks shallower than 3 is worthwhile here. The point of this package is deep learning after all, and
nnet
can be used for single hidden layer models.Alternatively, sampling could be done in
c(0, 2:20)
. The probability of all three layers being zero is very low, so this would allow exploration of shallower architectures while drastically reducing the number of models that are fit with inappropriate number of neurons. Thoughts on these two options?As a final--and separate issue actually--we could expand the network to a depth of four. When combined with sampling in
c(0, 2:20)
, inappropriate network architectures should almost never be sampled.