Training basic recurrent net "from scratch" (C#)

I'm trying to implement recurrent net "from scratch". For this purpose the most simple net was prepared:

// Входная переменная / input variable
Variable inputVar = Variable.InputVariable(new int[] { 1 }, DataType.Double, "InputVariable");
// Заглушка предыдущего состояния / prev state placeholder
Variable stateholder = Variable.PlaceholderVariable(new int[] { 1 }, inputVar.DynamicAxes);
// Сеть / net
Function net = CNTKLib.Plus(inputVar, stateholder);
// Подменяем состояние / replace prev state
net = net.ReplacePlaceholders(new Dictionary<Variable, Variable>() { { stateholder, CNTKLib.PastValue(net.Output) } });
// Выходной слой / output layer
net = CNTKLib.Plus(CNTKLib.SequenceLast(net.Output), new Parameter(new int[] { 1 }, DataType.Double, CNTKLib.NormalInitializer(1d)));

Then i'm trying to approximate the next equation (Output = (State = Input + PrevState) + 2):

Input       State       Output
0       0       2
1       1       3
2       3       5
3       6       8
4       10      12
5       15      17

var labelsVar = Variable.InputVariable(net.Output.Shape, DataType.Double, "labelsVariable", new Axis[] { Axis.DefaultBatchAxis() });
var trainer = Trainer.CreateTrainer(
    net,
    CNTKLib.SquaredError(net, labelsVar),
    CNTKLib.SquaredError(net, labelsVar),
    new Learner[] { CNTKLib.AdamLearner(new ParameterVector((System.Collections.ICollection)net.Parameters()), new TrainingParameterScheduleDouble(1, Learner.IgnoredMinibatchSize), new TrainingParameterScheduleDouble(0.971), true) }
);
double minLoss = double.MaxValue;
for (int i = 1000; i-- > 0;)
{
#pragma warning disable 618
    trainer.TrainMinibatch(new Dictionary<Variable, Value>() {
        {
            inputVar, Value.CreateBatch(new int[] { 1 }, new double[] { 0,1,2,3,4,5 }, DeviceDescriptor.CPUDevice)
        },
        {
            labelsVar, Value.CreateBatch(new int[] { 1 }, new double[] { 2,3,5,8,12,17 }, DeviceDescriptor.CPUDevice)
        }
    }, DeviceDescriptor.CPUDevice);
#pragma warning restore 618
    var loss = trainer.PreviousMinibatchLossAverage();
    minLoss = Math.Min(minLoss, loss);
}

The choice of the method to create value Value.CreateBatch() is due to the few examples of lstm training from the internet that were found. But it seems like recurrence engine doesn't apply and all of the inputVar values 0,1,2,3,4,5 are considered as new sequences each with own hidden/previous state (which is 0 for all of them). In the inference stage to enable the recurrence engine we should use the Value.CreateSequence for the input values...

Value inputValue = Value.CreateSequence(new int[] { 1 }, new double[] { 0, 1, 2, 3, 4, 5, 6 }, DeviceDescriptor.CPUDevice);
Dictionary<Variable, Value> outputDict = new Dictionary<Variable, Value>() { { net.Output, null } };

net.Evaluate(new Dictionary<Variable, Value>() { { inputVar, inputValue } }, outputDict, DeviceDescriptor.CPUDevice);

var outputValue = outputDict[net.Output].GetDenseData<double>(net.Output).Select(x => x[0]).ToArray();

... because using Value.CreateBatch leads to considering the input vector as several separate sequences with length 1. But using method Value.CreateSequence in train stage cause an exception.

What the correct way to train recurrent net from scratch (without minibatchSource) to ensure that input values considered as one sequence?

microsoft / CNTK

Training basic recurrent net "from scratch" (C#) #3782