Closed cprschmid closed 8 years ago
Hi,
You can find an example of a regression task in the Wiki All your questions are addressed inside :)
Let us know if you need more assistance. Hope it helps, Morgan
I didn't mean Logistic Regression as the model algorithm, but rather a regression task where the goal is to predict a value (e.g., housing prices, stock prices, etc.) rather than classifying the input image or sequence learning.
The example you pointed me to says "... Because we are performing binary classification, we could set this up either as a multi-class classification problem ..."
You just need to change the output layer to a linear one:
p = w * features + b
Change the objective function in order to minimize SquareError
criterionNodes = (err)
and remove the unecessary line :
lr = Logistic (labels, p)
It'll output real values, and you'll train your network to minimize the error between the network output and the real one :)
Morgan
I am trying to get a simple example working. Simple because the training and the test data are exactly the same. Furthermore, the relationship between the inputs and the labels is strictly linear so it should be learned correctly:
|F 1.0 1 |L 10
|F 2.0 1 |L 20
|F 3.0 1 |L 30
|F 4.0 1 |L 40
I am using the following network definition (it is doing mean and variance normalization, but somewhere I read that CNTK will do that automatically?):
# macros to include
load = ndlDLTMacros
# the actual NDL that defines the network
run = DNN
ndlDLTMacros = [
featDim = 2
labelDim = 1
features = Input(featDim)
labels = Input(labelDim)
# input precompute
featMean = Mean(features)
featInvStd = InvStdDev(features)
featInput = PerDimMeanVarNormalization(features, featMean, featInvStd)
]
DNN = [
# Variables
hiddenDim = 3
# Layer Operations
# DNNSigmoidLayer and DNNLayer are defined in Macros.ndl
h1 = DNNSigmoidLayer(featDim, hiddenDim, featInput, 1)
ol = DNNLayer(hiddenDim, labelDim, h1, 1)
# Criterion
sqerr = SquareError(labels, ol)
# Eval
ep = ErrorPrediction(labels, ol)
# Special Nodes
FeatureNodes = (features)
LabelNodes = (labels)
CriterionNodes = (sqerr)
EvalNodes = (ep)
OutputNodes = (ol)
]
The macros are defined as such:
DNNSigmoidLayer(inDim, outDim, x, parmScale) = [
# Parameters
W = Parameter(outDim, inDim, init="uniform", initValueScale=parmScale)
b = Parameter(outDim, 1, init="uniform", initValueScale=parmScale)
# Functions
t = Times(W, x)
z = Plus(t, b)
y = Sigmoid(z)
]
DNNReLULayer(inDim, outDim, x, parmScale) = [
# Parameters
W = Parameter(outDim, inDim, init="uniform", initValueScale=parmScale)
b = Parameter(outDim, 1, init="uniform", initValueScale=parmScale)
# Functions
t = Times(W, x)
z = Plus(t, b)
y = RectifiedLinear(z)
]
DNNLayer(inDim, outDim, x, parmScale) = [
# Parameters
W = Parameter(outDim, inDim, init="uniform", initValueScale=parmScale)
b = Parameter(outDim, 1, init="uniform", initValueScale=parmScale)
# Functions
t = Times(W, x)
z = Plus(t, b)
]
And finally the configuration file is as follows:
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license. See LICENSE file in the project root for full license information.
# currentDirectory=$(SolutionDir)/<path to corresponding data folder>
RootDir = ".."
ConfigDir = "$RootDir$/Config"
DataDir = "$RootDir$/Data"
OutputDir = "$RootDir$/Output"
ModelDir = "$OutputDir$/Models"
# which commands to run
command=Train:Test:Output:dumpNodeInfo
#required...
precision = "float"
modelPath="$OutputDir$/Models/simple.dnn" # where to write the model to
ndlMacros="$ConfigDir$/Macros.ndl"
# uncomment the following line to write logs to a file
stderr = "$OutputDir$/simple_out"
traceLevel=1
deviceId=-1 # CPU < 0
inputDimension=2 # input data dimensions
labelDimension=1 # label dimensions
#######################################
# TRAINING CONFIG #
#######################################
Train=[
action="train"
NDLNetworkBuilder=[
networkDescription = "$ConfigDir$/simple.ndl"
]
SGD = [
epochSize=0 # =0 means size of the training set
minibatchSize=100
learningRatesPerMB=0.1 # learning rates per MB
momentumPerMB = 0
maxEpochs=10
]
# parameter values for the reader
reader = [
readerType = "CNTKTextFormatReader"
file = "Train-Simple.txt"
randomize = "none"
maxErrors = 100
traceLevel = 2
input = [
features = [
alias = "F"
dim = 2
format = "dense"
]
labels = [
alias = "L"
dim = 1
format = "dense"
]
]
]
]
#######################################
# TEST CONFIG #
#######################################
Test=[
action="test"
reader = [
readerType = "CNTKTextFormatReader"
file = "Test-Simple.txt"
#skipSequenceIds = "true"
randomize = "none"
maxErrors = 100
traceLevel = 2
input = [
features = [
alias = "F"
dim = 2
format = "dense"
]
labels = [
alias = "L"
dim = 1
format = "dense"
]
]
]
]
# output the results
Output=[
action="write"
reader = [
readerType = "CNTKTextFormatReader"
file = "Test-Simple.txt"
randomize = "none"
maxErrors = 100
traceLevel = 2
input = [
features = [
alias = "F"
dim = 2
format = "dense"
]
labels = [
alias = "L"
dim = 1
format = "dense"
]
]
]
outputPath="$OutputDir$/simple.output.txt"
]
dumpNodeInfo=[
action = dumpnode
printValues = true
printMetadata = true
]
The first thing I note when I do the training is that the sqerr is still 600 after 10 epochs. Furthermore the predictions (in the ol.z file) are as follows:
3.338846
3.413505
3.487528
3.560431
which is not even close to the expected sequence (10, 20, 30, 40).
There might be a mismatch between the pre-processing of the data for the train vs. test commands, but I am not sure.
Is that explicit normalization necessary? Should it be done as part of the training and test set generation?
It seems to be doing the right thing, but your learning rate is too small. Try with a higher learning rate, or use learningRatePerSample.
Indeed - changing the learning rate, increasing the number of epochs eventually produced a model that successfully learned the linear relationship.
I moved on to a more realistic regression task: learning to predict housing prices using the well known UCI Housing Data Set. .
I am using the same model architecture (as above), adjusting it for a larger input layer (13 features), and increasing the size of the hidden layer (200 nodes). However, after only 3 epochs the training aborts with the sqerr = 1.#QNAN000 * 253 value (steadily increasing with each epoch).
What's causing the sqerr to be so big to start with and to increase steadily? The feature and the label values are all < 100. Even a network that always predicts 0 will have a training set sqerr less than a million.
Your training is not converging. Reasons are too large learning rate, too large or small initialization values of learnable parameters, and too large minibatch size.
There are options in the SGD configuration block that allows you to see partial objectives as they progress:
numMBsToShowResult = 100 # show intermediate objective values every 100 minibatches
firstMBsToShowResult = 10 # and for the first 10 minibatches
This way you should be able to see whether at the very start you are already starting out with something off.
I recommend to start with a small minibatch size, maybe 128, and a small learningRatePerSample
, maybe 0.001. And then I would try different orders of magnitude of initialization parameters for the weight matrices. Normally one would think they should be close to zero with very small perturbations to break ties, but I have found that larger init values sometimes lead to better results, or at least to getting off the ground earlier.
@frankseide Thank you for your feedback. I am still learning how to use the different knobs (parameters) to control the learning behavior. I used some of your suggestions already and was able to get the system to learn to some extent the housing data.
Trust me, we are all still learning about these knobs! It still has way too much elements of a black art, but sadly that's still the state of play. So please do not hesitate to ask further questions (and maybe share your experience as well if you like).
I had a lot of trouble with this topic too. I've created a notebook with notes that I think will help people trying to learn function approximation. You can see the pull request here: https://github.com/Microsoft/CNTK/pull/1767/files
I am doing the exactly same thing but for nonlinear input data,
I used a 2 layer model one for the hidden layer,and i am using sigmoid activation function, and for the other layer which is output layer and i am using a leaky RELU in it.
But i am struggling with the results, the putout data are all the same number which is definitely wrong,
Any Help please ?
I have read the documentation looking for an example on how to setup (reader, network configuration) a regression problem (e.g., predict housing prices from various inputs). However, the (reader) examples describe only classification or sequence learning tasks.
I there any information anywhere that I could use to to get started with a regression task?
My main 2 questions are: