Closed piedralaves closed 1 year ago
Hi @piedralaves
For SeqClassification, you can check the following code in Seq2SeqSharp\Applications\SeqClassification.cs file
m_srcEmbedding = new MultiProcessorNetworkWrapper
The config item "IsEmbeddingTrainable" indicates if the embedding is trainable or not. You could set it to true and retry it.
My question is if the values of m_srcEmbedding must change if IsEmbeddingTrainable is set to true. It is expected to change, isn't it?
In other words. If you upload a pretrained embedding and set to true the "IsEmbeddingTrainable" parameter,the values of m_srcEmbedding are expected to change when training, is that true?
We are writing m_srcEmbedding before and after training and looks the same values.
Based on above code, if you set IsEmbeddingTrainable to true, the weights of m_srcEmbedding will be updated during training (this means "trainable").
Thanks Zhongkai Fu
Thanks a lot All right now
Just a remark:
We are writing m_srcEmbedding before and after training
//writing m_srcEmbedding before training (our function to write the matrix) Program.writeEmbeddingMatrix(ss, "PRE");
//training (as usual) ss.Train(maxTrainingEpoch: opts.MaxEpochNum, trainCorpus: trainCorpus, validCorpusList: validCorpusList.ToArray(), learningRate: learningRate, optimizer: optimizer, taskId2metrics: taskId2metrics, decodingOptions: decodingOptions);
//write m_srcEmbedding after training (again our function to write the matrix) Program.writeEmbeddingMatrix(ss, "POS");
We get two similar matrices in both files when training is finished. Nonetheless, if we start an incremental training, the matrix written by Program.writeEmbeddingMatrix(ss, "PRE") is new and different. Looks like the model (extension .model) that is loaded in the incremental training has the updated matrix with the updated weights.
This is the thing that puzle us
Sorry that I didn't understand this part: "Nonetheless, if we start an incremental training, the matrix written by Program.writeEmbeddingMatrix(ss, "PRE") is new and different."
Do you mean after you load the existing model, and immediately call "Program.writeEmbeddingMatrix(ss, "PRE")", then you have different output matrix ?
How did you implement writeEmbeddingMatrix function? Where did you call it? Can you please show more context about it? Can you also please share your config file ? Especially settings for "DeviceIds" and "ProcessorType".
"Do you mean after you load the existing model, and immediately call "Program.writeEmbeddingMatrix(ss, "PRE")", then you have different output matrix ?"
No, I mean that when we call Program.writeEmbeddingMatrix(ss, false) and Program.writeEmbeddingMatrix(ss, true) in the below code, we obtain the same weights.
The points we call writeEmbeddingMatrix are in the program.cs of SeqClassificationConsole:
//Writing befor training
Program.writeEmbeddingMatrix(ss, false);
// Add event handler for monitoring
ss.StatusUpdateWatcher += Misc.Ss_StatusUpdateWatcher;
ss.EvaluationWatcher += Ss_EvaluationWatcher;
// Kick off training
ss.Train(maxTrainingEpoch: opts.MaxEpochNum, trainCorpus: trainCorpus, validCorpusList: validCorpusList.ToArray(), learningRate: learningRate, optimizer: optimizer, taskId2metrics: taskId2metrics, decodingOptions: decodingOptions);
//Writing after training
Program.writeEmbeddingMatrix(ss, true);
This is the function to write the Embedding matrix
`private static void writeEmbeddingMatrix(SeqClassification sqc, bool isPos)
{
//for writing the matrix-------------------------------------
long[] auxLong = new long[2];
string namefragment = "PRE";
if (isPos)
{
namefragment = "POS";
}
for (int k = 0; k < sqc.DeviceIds.Length; k++)
{
using (StreamWriter writer = new StreamWriter(opts.TrainCorpusPath + "/" + namefragment + "embedding.txt"))
{
string auxString = "";
for (int i = 0; i < sqc.m_srcEmbedding.GetNetworkOnDevice(k).Rows - 1; i++)
{
auxString = "";
for (int j = 0; j < sqc.m_srcEmbedding.GetNetworkOnDevice(k).Columns - 1; j++)
{
auxLong[0] = i;
auxLong[1] = j;
auxString = auxString + sqc.m_srcEmbedding.GetNetworkOnDevice(k).GetWeightAt(auxLong) + " ";
}
//Logger.WriteLine($"writing first matrix. " + auxString + "\n");
writer.WriteLine(auxString);
}
}
}
//-------------------------------------------------------
}`
The config:
{
"Task":"Train",
"HiddenSize":300,
"EmbeddingDim": 300,
"SrcVocabSize": 3145,
"TgtVocabSize": 313,
"IsEmbeddingTrainable": true,
"IsEncoderTrainable": true,
"StartLearningRate":0.0006,
"WeightsUpdateCount":0,
"EncoderLayerDepth":2,
"DecoderLayerDepth":2,
"SharedEmbeddings":false,
"EnableTagEmbeddings":false,
"TgtVocab":"C:/CUSTOM/Seq2SeqSharp/SeqClassificationTelco/clasVocabulary.txt",
"SrcVocab":"C:/CUSTOM/Seq2SeqSharp/SeqClassificationTelco/vocabulary.txt",
"SrcEmbeddingFilePath":"C:/CUSTOM/Seq2SeqSharp/SeqClassificationTelco/model.bin",
"SrcEmbeddingModelFilePath":null,
"TgtEmbeddingModelFilePath":null,
"ModelFilePath":"C:/CUSTOM/Seq2SeqSharp/SeqClassificationTelco/telcoModel6.model",
"TrainCorpusPath":"C:/CUSTOM/Seq2SeqSharp/SeqClassificationTelco/train",
"ValidCorpusPaths":"C:/CUSTOM/Seq2SeqSharp/SeqClassificationTelco/test",
"SrcLang":"SAM",
"TgtLang":"CLA",
"InputTestFile":null,
"OutputTestFile":null,
"ShuffleType":"NoPadding",
"ShuffleBlockSize":-1,
"GradClip":5.0,
"BatchSize":8,
"ValBatchSize":5,
"DropoutRatio":0.0,
"ProcessorType":"CPU",
"EncoderType":"Transformer",
"DecoderType":"Transformer",
"MultiHeadNum":10,
"DeviceIds":"0",
"BeamSearchSize":8,
"MaxEpochNum":1,
"MaxTrainSentLength":110,
"MaxTestSentLength":110,
"WarmUpSteps":8000,
"VisualizeNNFilePath":null,
"Beta1":0.9,
"Beta2":0.98,
"ValidIntervalHours":1.0,
"EnableCoverageModel":false,
"CompilerOptions":"--use_fast_math --gpu-architecture=compute_60",
"Optimizer":"Adam"
}
Here are some things you can check.
Check if IsTrainable property is true after you load the matrix from the model. You can find its definition in WeightTensor.cs: https://github.com/zhongkaifu/Seq2SeqSharp/blob/master/Seq2SeqSharp/Tools/WeightTensor.cs If it's true, its weights will be updated during training. In you example, you can check "qc.m_srcEmbedding.GetNetworkOnDevice(k).IsTrainable"
You can directly dump weights by calling WeightTensor.ToWeightArray() (in your example: qc.m_srcEmbedding.GetNetworkOnDevice(k).ToWeightArray())
Did you use Seq2SeqSharp to train your model from the scratch, and then you run another incremental training by loading the existing model ? Can you also share your logs ?
I will check what you say and let you know. Thanks a lot. We appreciate your advices very much.
The logs are attached in this message: SeqClassificationConsole_Train_2022_07_21_12h_39m_10s.log SeqClassificationConsole_Train_2022_07_24_12h_42m_08s.log
I just looked into your logs file, but didn't find any keyword for "SrcEmbeddings". In SeqClassification.cs, the "SrcEmbeddings" is created by this line:
m_srcEmbedding = new MultiProcessorNetworkWrapper
You should keep it and load external embedding after it.
For any trainable weights, you will find lines like "Added weight '{weights name}' to optimizer. Learing rate factor = '...'". In your log files, I didn't find such lines for "SrcEmbeddings".
The line you aim is in CreateTrainableParameters, isn't it?
private bool CreateTrainableParameters(IModel model)
{
Logger.WriteLine($"Creating encoders...");
var raDeviceIds = new RoundArray<int>(DeviceIds);
int contextDim;
(m_encoder, contextDim) = Encoder.CreateEncoders(model, m_options, raDeviceIds);
m_encoderFFLayer = new MultiProcessorNetworkWrapper<IFeedForwardLayer>[model.ClsVocabs.Count];
for (int i = 0; i < model.ClsVocabs.Count; i++)
{
m_encoderFFLayer[i] = new MultiProcessorNetworkWrapper<IFeedForwardLayer>(new FeedForwardLayer($"FeedForward_Encoder_{i}", contextDim, model.ClsVocabs[i].Count, dropoutRatio: 0.0f, deviceId: raDeviceIds.GetNextItem(), isTrainable: true), DeviceIds);
}
(m_posEmbedding, m_segmentEmbedding) = Misc.CreateAuxEmbeddings(raDeviceIds, contextDim, Math.Max(m_options.MaxTrainSentLength, m_options.MaxTestSentLength), model);
Logger.WriteLine($"Creating embeddings. Shape = '({model.SrcVocab.Count} ,{model.EncoderEmbeddingDim})'");
m_srcEmbedding = new MultiProcessorNetworkWrapper<IWeightTensor>(new WeightTensor(new long[2] { model.SrcVocab.Count, model.EncoderEmbeddingDim }, raDeviceIds.GetNextItem(), normType: NormType.Uniform, fanOut: true, name: "SrcEmbeddings", isTrainable: m_options.IsEmbeddingTrainable), DeviceIds);
return true;
}
And it seems that is called properly when debbugging.
public SeqClassification(string srcEmbeddingFilePath, SeqClassificationOptions options, Vocab srcVocab = null, List<Vocab> clsVocabs = null)
: base(options.DeviceIds, options.ProcessorType, options.ModelFilePath, options.MemoryUsageRatio, options.CompilerOptions, options.ValidIntervalHours, updateFreq: options.UpdateFreq)
{
m_shuffleType = options.ShuffleType;
m_options = options;
m_modelMetaData =new SeqClassificationModel(options.HiddenSize, options.EmbeddingDim, options.EncoderLayerDepth, options.MultiHeadNum,
options.EncoderType, srcVocab, clsVocabs, options.EnableSegmentEmbeddings, options.EnableTagEmbeddings, options.MaxSegmentNum);
m_dropoutRatio = options.DropoutRatio;
//Initializng weights in encoders and decoders
CreateTrainableParameters(m_modelMetaData);
// Load external embedding from files
for (int i = 0; i < DeviceIds.Length; i++)
{
//If pre-trained embedding weights are speicifed, loading them from files
//para clasificación se cargan solo los de la fuente. Ver función original para los otros, aunuqe conviene que sea el mismo
if (!String.IsNullOrEmpty(srcEmbeddingFilePath))
{
Logger.WriteLine($"Loading ExtEmbedding model from '{srcEmbeddingFilePath}' for source side.");
LoadWordEmbedding(srcEmbeddingFilePath, m_srcEmbedding.GetNetworkOnDevice(i), m_modelMetaData.SrcVocab.WordToIndex);
}
}
}
Could you see my code in azure devops?
You code looks good to me. In addition, the log file doesn't output "Register network 'SrcEmbeddings'". Can you please set a breakpoint to this line:
m_srcEmbedding = new MultiProcessorNetworkWrapper
And check the value of m_options.IsEmbeddingTrainble ?
did you mean this?
The value is true and it looks good. Is the m_srcEmbedding's definition like the line in below ?
private MultiProcessorNetworkWrapper
Can you also set a break point inside LoadParameters to check which networks in the dictionary m_name2network ? It should have all networks including source embedding, encoder and others.
protected virtual void LoadParameters(IModel model)
{
RegisterTrainableParameters(this);
foreach (KeyValuePair<string, IMultiProcessorNetworkWrapper> p in m_name2network)
{
var name = p.Key;
var mpnw = p.Value;
Logger.WriteLine($"Loading parameter '{name}'");
mpnw.Load(model);
}
}
The definition
public MultiProcessorNetworkWrapper<IWeightTensor> m_srcEmbedding; //The embeddings over devices for target
Which function calls LoadParameters?. It seems not to be fired. G
It gets called when loading an existing model from file. For training from the scratch, it wont' be called.
Anyway, you can set breakpoints inside Register method and check if source embedding is in networks and get registered.
private void Register(object childValue, string name) { if (childValue is IMultiProcessorNetworkWrapper networks) { m_name2network.Add(name, networks); Logger.WriteLine($"Register network '{name}'"); }
if (childValue is IMultiProcessorNetworkWrapper[] networksArray)
{
int idx = 0;
foreach (var network in networksArray)
{
string name2 = $"{name}_{idx}";
m_name2network.Add(name2, network);
Logger.WriteLine($"Register network '{name2}'");
idx++;
}
}
}
No, it seems it is not. Not in names, not in networks. Why?. Are you sure that in the original code it does work as you expected?
Yes, it's working. I noted your m_srcEmbedding is public, you need to change it to private and try it again.
In RegisterTrainableParameters method, it only iterates non public fields and properties and register them if it's a network or weight tensor.
internal void RegisterTrainableParameters(object obj) { if (m_name2network != null) { return; } Logger.WriteLine($"Registering trainable parameters."); m_name2network = new SortedList<string, IMultiProcessorNetworkWrapper>();
foreach (FieldInfo childFieldInfo in obj.GetType().GetFields(BindingFlags.NonPublic | BindingFlags.Instance))
{
object childValue = childFieldInfo.GetValue(obj);
string name = childFieldInfo.Name;
Register(childValue, name);
}
foreach (PropertyInfo childPropertyInfo in obj.GetType().GetProperties(BindingFlags.NonPublic | BindingFlags.Instance))
{
object childValue = childPropertyInfo.GetValue(obj);
string name = childPropertyInfo.Name;
Register(childValue, name);
}
}
Ok. That could be the cause. I will set again m_srcEmbedding to private and try to write the matrix in another way. Then, I let you know. Thanks a lot G
It seems that it was the cause. I am now starting the training and will write the matrix before and after it. I let you know. The log looks fine now, doesn't it? SeqClassificationConsole_Train_2022_07_27_01h_29m_49s.log
Thank a lot
Yes, it looks good now.
Hi again:
As you know, we are researching about the effect of pretrained embedding matrix. For this purpose, we are writing [SeqClassification].m_srcEmbedding (the embedding matrix) before and after training, while"IsEmbeddingTrainable" is set to "true".
The fact is that the embedding matrix before and after looks the same. Maybe I misunderstand something. should m_srcEmbedding change due to that the parameter "IsEmbeddingTrainable" is set to "true"? We are thinking about the case, but maybe you can advise us.
Thanks a lot
G
Thanks a lot