zhongkaifu / RNNSharp

RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.
BSD 3-Clause "New" or "Revised" License
285 stars 92 forks source link

IndexOutOfRangeException when using RNN #9

Closed bratao closed 8 years ago

bratao commented 8 years ago

Hello again! Using today's version ( 4fad1b6 ), It can't converge using LTSM compared to yesterday version, and give me a crash when using the RNN.

The invoked command line was:

-mode train -trainfile wnnsharp-data-rsysyi.txt -modelfile rnnsharp.model -validfile wnnsharp-data-rsysyi.txt -ftrfile wnnsharp-config-qxnigh.txt -tagfile avaliable-tags.txt -modeltype 0 -layersize 50 -alpha 0.1 -crf 0 -maxiter 30 -savestep 200K -dir 0 -dropout 0

The error is:

Unhandled Exception: System.AggregateException: One or more errors occurred. ---> System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at RNNSharp.RNN.<>c__DisplayClass107_0.<matrixXvectorADD>b__0(Int32 i) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\RNN.cs:line 660
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.For(Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body)
   at RNNSharp.RNN.matrixXvectorADD(SimpleLayer dest, SimpleLayer srcvec, Matrix`1 srcmatrix, Int32 DestSize, Int32 SrcSize, Int32 type) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\RNN.cs:line 685
   at RNNSharp.SimpleRNN.computeHiddenLayer(State state, Boolean isTrain) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\SimpleRNN.cs:line 157
   at RNNSharp.RNN.PredictSentence(Sequence pSequence, RunningMode runningMode) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\RNN.cs:line 267
   at RNNSharp.RNN.TrainNet(DataSet trainingSet, Int32 iter) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\RNN.cs:line 536
   at RNNSharp.RNNEncoder.Train() in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\RNNEncoder.cs:line 133
   at RNNSharpConsole.Program.Main(String[] args) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharpConsole\Program.cs:line 284
bratao commented 8 years ago

With a debug build, the error is more descriptive


Unhandled Exception: System.AggregateException: One or more errors occurred. ---> System.NotSupportedException: Vector<T>.Count cannot be called via reflection when intrinsics are enabled.
   at System.Numerics.Vector`1.get_Count()
   at RNNSharp.RNN.<>c__DisplayClass107_0.<matrixXvectorADD>b__0(Int32 i) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\RNN.cs:line 662
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.For(Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body)
   at RNNSharp.RNN.matrixXvectorADD(SimpleLayer dest, SimpleLayer srcvec, Matrix`1 srcmatrix, Int32 DestSize, Int32 SrcSize, Int32 type) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\RNN.cs:line 658
   at RNNSharp.SimpleRNN.computeHiddenLayer(State state, Boolean isTrain) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\SimpleRNN.cs:line 157
   at RNNSharp.RNN.PredictSentence(Sequence pSequence, RunningMode runningMode) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\RNN.cs:line 265
   at RNNSharp.RNN.TrainNet(DataSet trainingSet, Int32 iter) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\RNN.cs:line 536
   at RNNSharp.RNNEncoder.Train() in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\RNNEncoder.cs:line 101
   at RNNSharpConsole.Program.Train() in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharpConsole\Program.cs:line 492
   at RNNSharpConsole.Program.Main(String[] args) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharpConsole\Program.cs:line 272
zhongkaifu commented 8 years ago

Thanks @bratao . This is a known bug, which is caused by non-alignment between data size (it's hidden layer size in your case) and SIMD registers in CPU. I will fix it by data alignment,

bratao commented 8 years ago

Hi @zhongkaifu !!

This bug is showing up again after the latest set of commits =(

zhongkaifu commented 8 years ago

Could you please show me the call stack or other information for debugging ? Is this the same call stack as before ?

bratao commented 8 years ago

Sure, One detail is that the training works now. The error in during the tagging.

In debug I just get this:

Unhandled Exception: System.NotSupportedException: Vector<T>.Count cannot be called via reflection when intrinsics are enabled.
   at System.Numerics.Vector`1.get_Count()
   at RNNSharp.ModelSetting.DumpSetting() in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\ModelSetting.cs:line 51
   at RNNSharpConsole.Program.Train() in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharpConsole\Program.cs:line 444
   at RNNSharpConsole.Program.Main(String[] args) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharpConsole\Program.cs:line 277

In Release I get this error:

info,3/9/2016 5:21:05 PM Loading template feature set...
info,3/9/2016 5:21:05 PM Template feature size: 11173
info,3/9/2016 5:21:05 PM Template feature context size: 11173
info,3/9/2016 5:21:05 PM Get model type LSTM and direction FORWARD
info,3/9/2016 5:21:05 PM Model Structure: LSTM-RNN
info,3/9/2016 5:21:05 PM Loading LSTM-RNN model: rnnsharp.model
info,3/9/2016 5:21:05 PM Loading input2hidden weights...
info,3/9/2016 5:21:05 PM Loading LSTM-Weight: width:60, height:5432, vqSize:0...
info,3/9/2016 5:21:05 PM Loading hidden2output weights...
info,3/9/2016 5:21:05 PM Loading matrix. width: 60, height: 22, vqSize: 0
info,3/9/2016 5:21:05 PM CRF Model: False

Unhandled Exception: System.AggregateException: One or more errors occurred. ---> System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at RNNSharp.LSTMRNN.<>c__DisplayClass36_0.<computeHiddenLayer>b__0(Int32 j) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\LSTMRNN.cs:line 736
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.For(Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body)
   at RNNSharp.LSTMRNN.computeHiddenLayer(State state, Boolean isTrain) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\LSTMRNN.cs:line 803
   at RNNSharp.RNN.PredictSentence(Sequence pSequence, RunningMode runningMode) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\RNN.cs:line 277
   at RNNSharp.RNNDecoder.Process(Sentence sent) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharp\RNNDecoder.cs:line 77
   at RNNSharpConsole.Program.Test() in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharpConsole\Program.cs:line 372
   at RNNSharpConsole.Program.Main(String[] args) in C:\sbuild\mine\backend_broka\Candidatos\RNNSharp-master\RNNSharpConsole\Program.cs:line 289
zhongkaifu commented 8 years ago

Thanks.

The second one should be right exception, however, I cannot repro it in my machine. From your call stack, the exception is from "LSTMCell cell_j = neuHidden[j]", and j was outside the bounds of neuHidden. Could you please set a break point there and see what was happen ?

        Parallel.For(0, L1 - 1, parallelOption, j =>
        {
            LSTMCell cell_j = neuHidden[j];

            //hidden(t-1) -> hidden(t)
            cell_j.previousCellState = cell_j.cellState;
bratao commented 8 years ago

@zhongkaifu , Sorry, my mistake !! In the recent build, there is a regression, as it is not saving the model anymore if you´re not using a validated corpus. So I was using an older model, generated by an older version.

Fixing it solved it !

zhongkaifu commented 8 years ago

Thanks @bratao . If validated corpus isn't provided, model should be saved when we get better result in training corpus. I will fix this problem.