zhongkaifu / Seq2SeqSharp

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.
Other
193 stars 38 forks source link

writing embedding matrix #53

Closed piedralaves closed 1 year ago

piedralaves commented 1 year ago

Hi again:

As you know, we are researching about the effect of pretrained embedding matrix. For this purpose, we are writing [SeqClassification].m_srcEmbedding (the embedding matrix) before and after training, while"IsEmbeddingTrainable" is set to "true".

The fact is that the embedding matrix before and after looks the same. Maybe I misunderstand something. should m_srcEmbedding change due to that the parameter "IsEmbeddingTrainable" is set to "true"? We are thinking about the case, but maybe you can advise us.

Thanks a lot

G

Thanks a lot

zhongkaifu commented 1 year ago

Hi @piedralaves

For SeqClassification, you can check the following code in Seq2SeqSharp\Applications\SeqClassification.cs file

m_srcEmbedding = new MultiProcessorNetworkWrapper(new WeightTensor(new long[2] { model.SrcVocab.Count, model.EncoderEmbeddingDim }, raDeviceIds.GetNextItem(), normType: NormType.Uniform, fanOut: true, name: "SrcEmbeddings", isTrainable: m_options.IsEmbeddingTrainable), DeviceIds);

The config item "IsEmbeddingTrainable" indicates if the embedding is trainable or not. You could set it to true and retry it.

piedralaves commented 1 year ago

My question is if the values of m_srcEmbedding must change if IsEmbeddingTrainable is set to true. It is expected to change, isn't it?

In other words. If you upload a pretrained embedding and set to true the "IsEmbeddingTrainable" parameter,the values of m_srcEmbedding are expected to change when training, is that true?

We are writing m_srcEmbedding before and after training and looks the same values.

zhongkaifu commented 1 year ago

Based on above code, if you set IsEmbeddingTrainable to true, the weights of m_srcEmbedding will be updated during training (this means "trainable").

Thanks Zhongkai Fu

piedralaves commented 1 year ago

Thanks a lot All right now

piedralaves commented 1 year ago

Just a remark:

We are writing m_srcEmbedding before and after training

//writing m_srcEmbedding before training (our function to write the matrix) Program.writeEmbeddingMatrix(ss, "PRE");

//training (as usual) ss.Train(maxTrainingEpoch: opts.MaxEpochNum, trainCorpus: trainCorpus, validCorpusList: validCorpusList.ToArray(), learningRate: learningRate, optimizer: optimizer, taskId2metrics: taskId2metrics, decodingOptions: decodingOptions);

//write m_srcEmbedding after training (again our function to write the matrix) Program.writeEmbeddingMatrix(ss, "POS");

We get two similar matrices in both files when training is finished. Nonetheless, if we start an incremental training, the matrix written by Program.writeEmbeddingMatrix(ss, "PRE") is new and different. Looks like the model (extension .model) that is loaded in the incremental training has the updated matrix with the updated weights.

This is the thing that puzle us

zhongkaifu commented 1 year ago

Sorry that I didn't understand this part: "Nonetheless, if we start an incremental training, the matrix written by Program.writeEmbeddingMatrix(ss, "PRE") is new and different."

Do you mean after you load the existing model, and immediately call "Program.writeEmbeddingMatrix(ss, "PRE")", then you have different output matrix ?

How did you implement writeEmbeddingMatrix function? Where did you call it? Can you please show more context about it? Can you also please share your config file ? Especially settings for "DeviceIds" and "ProcessorType".

piedralaves commented 1 year ago

"Do you mean after you load the existing model, and immediately call "Program.writeEmbeddingMatrix(ss, "PRE")", then you have different output matrix ?"

No, I mean that when we call Program.writeEmbeddingMatrix(ss, false) and Program.writeEmbeddingMatrix(ss, true) in the below code, we obtain the same weights.

The points we call writeEmbeddingMatrix are in the program.cs of SeqClassificationConsole:

                //Writing befor training
                Program.writeEmbeddingMatrix(ss, false);

                // Add event handler for monitoring
                ss.StatusUpdateWatcher += Misc.Ss_StatusUpdateWatcher;
                ss.EvaluationWatcher += Ss_EvaluationWatcher;

                // Kick off training
                ss.Train(maxTrainingEpoch: opts.MaxEpochNum, trainCorpus: trainCorpus, validCorpusList: validCorpusList.ToArray(), learningRate: learningRate, optimizer: optimizer, taskId2metrics: taskId2metrics, decodingOptions: decodingOptions);

                //Writing after training
                Program.writeEmbeddingMatrix(ss, true);

This is the function to write the Embedding matrix

`private static void writeEmbeddingMatrix(SeqClassification sqc, bool isPos)
        {

            //for writing the matrix-------------------------------------
            long[] auxLong = new long[2]; 
            string namefragment = "PRE";

            if (isPos)
            {
                namefragment = "POS";
            }

            for (int k = 0; k < sqc.DeviceIds.Length; k++)
            {

                using (StreamWriter writer = new StreamWriter(opts.TrainCorpusPath + "/" + namefragment + "embedding.txt"))
                {
                    string auxString = "";
                    for (int i = 0; i < sqc.m_srcEmbedding.GetNetworkOnDevice(k).Rows - 1; i++)
                    {
                        auxString = "";
                        for (int j = 0; j < sqc.m_srcEmbedding.GetNetworkOnDevice(k).Columns - 1; j++)
                        {
                            auxLong[0] = i;
                            auxLong[1] = j;
                            auxString = auxString + sqc.m_srcEmbedding.GetNetworkOnDevice(k).GetWeightAt(auxLong) + " ";

                        }

                        //Logger.WriteLine($"writing first matrix. " + auxString + "\n");

                        writer.WriteLine(auxString);

                    }
                }

            }
            //-------------------------------------------------------
        }`
piedralaves commented 1 year ago

The config:

{
"Task":"Train",
"HiddenSize":300,
"EmbeddingDim": 300,
"SrcVocabSize": 3145,
"TgtVocabSize": 313,
"IsEmbeddingTrainable": true,
"IsEncoderTrainable": true,
"StartLearningRate":0.0006,
"WeightsUpdateCount":0,
"EncoderLayerDepth":2,
"DecoderLayerDepth":2,
"SharedEmbeddings":false,
"EnableTagEmbeddings":false,
"TgtVocab":"C:/CUSTOM/Seq2SeqSharp/SeqClassificationTelco/clasVocabulary.txt",
"SrcVocab":"C:/CUSTOM/Seq2SeqSharp/SeqClassificationTelco/vocabulary.txt",
"SrcEmbeddingFilePath":"C:/CUSTOM/Seq2SeqSharp/SeqClassificationTelco/model.bin",
"SrcEmbeddingModelFilePath":null,
"TgtEmbeddingModelFilePath":null,
"ModelFilePath":"C:/CUSTOM/Seq2SeqSharp/SeqClassificationTelco/telcoModel6.model",
"TrainCorpusPath":"C:/CUSTOM/Seq2SeqSharp/SeqClassificationTelco/train",
"ValidCorpusPaths":"C:/CUSTOM/Seq2SeqSharp/SeqClassificationTelco/test",
"SrcLang":"SAM",
"TgtLang":"CLA",
"InputTestFile":null,
"OutputTestFile":null,
"ShuffleType":"NoPadding",
"ShuffleBlockSize":-1,
"GradClip":5.0,
"BatchSize":8,
"ValBatchSize":5,
"DropoutRatio":0.0,
"ProcessorType":"CPU",
"EncoderType":"Transformer",
"DecoderType":"Transformer",
"MultiHeadNum":10,
"DeviceIds":"0",
"BeamSearchSize":8,
"MaxEpochNum":1,
"MaxTrainSentLength":110,
"MaxTestSentLength":110,
"WarmUpSteps":8000,
"VisualizeNNFilePath":null,
"Beta1":0.9,
"Beta2":0.98,
"ValidIntervalHours":1.0,
"EnableCoverageModel":false,
"CompilerOptions":"--use_fast_math --gpu-architecture=compute_60",
"Optimizer":"Adam"

}
zhongkaifu commented 1 year ago

Here are some things you can check.

  1. Check if IsTrainable property is true after you load the matrix from the model. You can find its definition in WeightTensor.cs: https://github.com/zhongkaifu/Seq2SeqSharp/blob/master/Seq2SeqSharp/Tools/WeightTensor.cs If it's true, its weights will be updated during training. In you example, you can check "qc.m_srcEmbedding.GetNetworkOnDevice(k).IsTrainable"

  2. You can directly dump weights by calling WeightTensor.ToWeightArray() (in your example: qc.m_srcEmbedding.GetNetworkOnDevice(k).ToWeightArray())

  3. Did you use Seq2SeqSharp to train your model from the scratch, and then you run another incremental training by loading the existing model ? Can you also share your logs ?

piedralaves commented 1 year ago

I will check what you say and let you know. Thanks a lot. We appreciate your advices very much.

The logs are attached in this message: SeqClassificationConsole_Train_2022_07_21_12h_39m_10s.log SeqClassificationConsole_Train_2022_07_24_12h_42m_08s.log

zhongkaifu commented 1 year ago

I just looked into your logs file, but didn't find any keyword for "SrcEmbeddings". In SeqClassification.cs, the "SrcEmbeddings" is created by this line: m_srcEmbedding = new MultiProcessorNetworkWrapper(new WeightTensor(new long[2] { model.SrcVocab.Count, model.EncoderEmbeddingDim }, raDeviceIds.GetNextItem(), normType: NormType.Uniform, fanOut: true, name: "SrcEmbeddings", isTrainable: m_options.IsEmbeddingTrainable), DeviceIds);

You should keep it and load external embedding after it.

For any trainable weights, you will find lines like "Added weight '{weights name}' to optimizer. Learing rate factor = '...'". In your log files, I didn't find such lines for "SrcEmbeddings".

piedralaves commented 1 year ago

The line you aim is in CreateTrainableParameters, isn't it?

private bool CreateTrainableParameters(IModel model)
        {
            Logger.WriteLine($"Creating encoders...");
            var raDeviceIds = new RoundArray<int>(DeviceIds);

            int contextDim;
            (m_encoder, contextDim) = Encoder.CreateEncoders(model, m_options, raDeviceIds);

            m_encoderFFLayer = new MultiProcessorNetworkWrapper<IFeedForwardLayer>[model.ClsVocabs.Count];
            for (int i = 0; i < model.ClsVocabs.Count; i++)
            {
                m_encoderFFLayer[i] = new MultiProcessorNetworkWrapper<IFeedForwardLayer>(new FeedForwardLayer($"FeedForward_Encoder_{i}", contextDim, model.ClsVocabs[i].Count, dropoutRatio: 0.0f, deviceId: raDeviceIds.GetNextItem(), isTrainable: true), DeviceIds);
            }

            (m_posEmbedding, m_segmentEmbedding) = Misc.CreateAuxEmbeddings(raDeviceIds, contextDim, Math.Max(m_options.MaxTrainSentLength, m_options.MaxTestSentLength), model);

            Logger.WriteLine($"Creating embeddings. Shape = '({model.SrcVocab.Count} ,{model.EncoderEmbeddingDim})'");
            m_srcEmbedding = new MultiProcessorNetworkWrapper<IWeightTensor>(new WeightTensor(new long[2] { model.SrcVocab.Count, model.EncoderEmbeddingDim }, raDeviceIds.GetNextItem(), normType: NormType.Uniform, fanOut: true, name: "SrcEmbeddings", isTrainable: m_options.IsEmbeddingTrainable), DeviceIds);

            return true;
        }

And it seems that is called properly when debbugging.

public SeqClassification(string srcEmbeddingFilePath, SeqClassificationOptions options, Vocab srcVocab = null, List<Vocab> clsVocabs = null)
           : base(options.DeviceIds, options.ProcessorType, options.ModelFilePath, options.MemoryUsageRatio, options.CompilerOptions, options.ValidIntervalHours, updateFreq: options.UpdateFreq)
        {

            m_shuffleType = options.ShuffleType;
            m_options = options;

            m_modelMetaData =new SeqClassificationModel(options.HiddenSize, options.EmbeddingDim, options.EncoderLayerDepth, options.MultiHeadNum,
                    options.EncoderType, srcVocab, clsVocabs, options.EnableSegmentEmbeddings, options.EnableTagEmbeddings, options.MaxSegmentNum);

            m_dropoutRatio = options.DropoutRatio;

            //Initializng weights in encoders and decoders
            CreateTrainableParameters(m_modelMetaData);

            // Load external embedding from files
            for (int i = 0; i < DeviceIds.Length; i++)
            {
                //If pre-trained embedding weights are speicifed, loading them from files
                //para clasificación se cargan solo los de la fuente. Ver función original para los otros, aunuqe conviene que sea el mismo
                if (!String.IsNullOrEmpty(srcEmbeddingFilePath))
                {
                    Logger.WriteLine($"Loading ExtEmbedding model from '{srcEmbeddingFilePath}' for source side.");
                    LoadWordEmbedding(srcEmbeddingFilePath, m_srcEmbedding.GetNetworkOnDevice(i), m_modelMetaData.SrcVocab.WordToIndex);
                }

            }
        }

Could you see my code in azure devops?

zhongkaifu commented 1 year ago

You code looks good to me. In addition, the log file doesn't output "Register network 'SrcEmbeddings'". Can you please set a breakpoint to this line: m_srcEmbedding = new MultiProcessorNetworkWrapper(new WeightTensor(new long[2] { model.SrcVocab.Count, model.EncoderEmbeddingDim }, raDeviceIds.GetNextItem(), normType: NormType.Uniform, fanOut: true, name: "SrcEmbeddings", isTrainable: m_options.IsEmbeddingTrainable), DeviceIds);

And check the value of m_options.IsEmbeddingTrainble ?

piedralaves commented 1 year ago

did you mean this?

image

zhongkaifu commented 1 year ago

The value is true and it looks good. Is the m_srcEmbedding's definition like the line in below ? private MultiProcessorNetworkWrapper m_srcEmbedding; //The embeddings over devices for target

Can you also set a break point inside LoadParameters to check which networks in the dictionary m_name2network ? It should have all networks including source embedding, encoder and others.

    protected virtual void LoadParameters(IModel model)
    {
        RegisterTrainableParameters(this);
        foreach (KeyValuePair<string, IMultiProcessorNetworkWrapper> p in m_name2network)
        {
            var name = p.Key;
            var mpnw = p.Value;

            Logger.WriteLine($"Loading parameter '{name}'");
            mpnw.Load(model);
        }
    }
piedralaves commented 1 year ago

The definition

public MultiProcessorNetworkWrapper<IWeightTensor> m_srcEmbedding; //The embeddings over devices for target

piedralaves commented 1 year ago

Which function calls LoadParameters?. It seems not to be fired. G

zhongkaifu commented 1 year ago

It gets called when loading an existing model from file. For training from the scratch, it wont' be called.

Anyway, you can set breakpoints inside Register method and check if source embedding is in networks and get registered.

private void Register(object childValue, string name) { if (childValue is IMultiProcessorNetworkWrapper networks) { m_name2network.Add(name, networks); Logger.WriteLine($"Register network '{name}'"); }

        if (childValue is IMultiProcessorNetworkWrapper[] networksArray)
        {
            int idx = 0;
            foreach (var network in networksArray)
            {
                string name2 = $"{name}_{idx}";
                m_name2network.Add(name2, network);
                Logger.WriteLine($"Register network '{name2}'");

                idx++;
            }
        }
    }
piedralaves commented 1 year ago

No, it seems it is not. Not in names, not in networks. Why?. Are you sure that in the original code it does work as you expected?

zhongkaifu commented 1 year ago

Yes, it's working. I noted your m_srcEmbedding is public, you need to change it to private and try it again.

In RegisterTrainableParameters method, it only iterates non public fields and properties and register them if it's a network or weight tensor.

internal void RegisterTrainableParameters(object obj) { if (m_name2network != null) { return; } Logger.WriteLine($"Registering trainable parameters."); m_name2network = new SortedList<string, IMultiProcessorNetworkWrapper>();

        foreach (FieldInfo childFieldInfo in obj.GetType().GetFields(BindingFlags.NonPublic | BindingFlags.Instance))
        {
            object childValue = childFieldInfo.GetValue(obj);
            string name = childFieldInfo.Name;
            Register(childValue, name);
        }
        foreach (PropertyInfo childPropertyInfo in obj.GetType().GetProperties(BindingFlags.NonPublic | BindingFlags.Instance))
        {
            object childValue = childPropertyInfo.GetValue(obj);
            string name = childPropertyInfo.Name;
            Register(childValue, name);
        }
    }
piedralaves commented 1 year ago

Ok. That could be the cause. I will set again m_srcEmbedding to private and try to write the matrix in another way. Then, I let you know. Thanks a lot G

piedralaves commented 1 year ago

It seems that it was the cause. I am now starting the training and will write the matrix before and after it. I let you know. The log looks fine now, doesn't it? SeqClassificationConsole_Train_2022_07_27_01h_29m_49s.log

Thank a lot

zhongkaifu commented 1 year ago

Yes, it looks good now.