Which are the right parameters in the configuration file if you want to use pretrained embbeddings?

piedralaves commented 2 years ago

I am starting to use SeqClassification. I have sentences in the source file and a category in the target file. My question is, which are the right parameters in the configuration file if you want to use pretrained embbeddings? I have a txt2vec pretrained model and I want to test its performance.

I have "SrcEmbedding":"model.bin" but I guest some more parameters are needed to indicate to the exe it has to use them.

Thanks a lot

zhongkaifu commented 2 years ago

Hi @piedralaves ,

I already removed "loading embeddings generated from Txt2Vec" features from Seq2SeqSharp, because you can use Seq2SeqSharp for both pretraining and fine-tuning.

For example: If you already have a large size of unlabeled data set, such as Wikipedia, you can build pairs in data set likes (sentences, category), and then send them to SeqClassification for pretraining. Once it gets done, you can use the data set for your specific task and pretrained model to do fine-tuning by SeqClassification as well.

Thanks Zhongkai Fu

piedralaves commented 2 years ago

Hi Zhongkai,

I dont know if I understand you propertly. Do you mean that there is no chance to load a pretrained embbedding matrix in Seq2SeqSharp?

If I put "SrcEmbedding":"model.bin", with a text2vec model, is it ignored?

Between others thing, I am interested in load pretrained embbedding matrices. It is not posible?

Thanks a lot

I just want to know how I can start with your impresive stack.

zhongkaifu commented 2 years ago

You can modify Seq2SeqSharp to add this feature back if you want to use it. You can check this file in the older version of Seq2SeqSharp: AttentionSeq2Seq.cs and check how LoadWordEmbedding functions and which code calls it.

private void LoadWordEmbedding(string extEmbeddingFilePath, IWeightTensor embeddingMatrix, IEnumerable<KeyValuePair<string, int>> wordToIndex)
        {
            Txt2Vec.Model extEmbeddingModel = new Txt2Vec.Model();
            extEmbeddingModel.LoadBinaryModel(extEmbeddingFilePath);

            if (extEmbeddingModel.VectorSize != embeddingMatrix.Columns)
            {
                throw new ArgumentException($"Inconsistent embedding size. ExtEmbeddingModel size = '{extEmbeddingModel.VectorSize}', EmbeddingMatrix column size = '{embeddingMatrix.Columns}'");
            }

            foreach (KeyValuePair<string, int> pair in wordToIndex)
            {
                float[] vector = extEmbeddingModel.GetVector(pair.Key);
                if (vector != null)
                {                    
                    embeddingMatrix.SetWeightAtRow(pair.Value, vector);
                }
            }
        }

The embeddings from pre-trained model usually have better quality than that from Txt2Vec (also likes Word2Vec). So I would suggest you to use pre-trained/fine-tuning pattern for your model.

Thanks Zhongkai Fu

piedralaves commented 2 years ago

Thanks a lot,

I am trying to enable LoadWordEmbedding(). The problem is that inside this procedure there is a call to Txt2Vec.Model , which is in your Txt2Vec. I have added Txt2Vec project to the main solution with net 6. Txt2vec calls the cronstructor of "VectorQuantization vq = new VectorQuantization();" which must be in advUtils, but there is not VectorQuantization class in current advUtils (in seq2seqsharp advUtils). Could you send me that class in order to include it in the current advUtils?

Another question, when you suggest to use pre-trained pattern, is what seq2seqsharp does, doesn`t it? I mean, with trainable embbeddings and automatic vocabulary and embbedding matrix generation.

I appreciate your responses a lot

Thanks

zhongkaifu commented 2 years ago

You can try this AdvUtils repo: https://github.com/zhongkaifu/AdvUtils

For your task, you can try to use the same model (and vocabulary) with different hyper-parameters, such as learning rate, max-update steps, and data set for pre-training and fine-tuning.

For example: You can use Transformer base model (6 layers and 512 dim), Wikipedia data set, and a larger starting learning rate, such as 0.0006 for pre-training. It may take a relatively long steps for updates (maybe 80K, but it depends on your tasks and data set). Once it gets done, you can use this pre-trained model, your data set, and a smaller starting learning rate, such as 0.00001 for fine-tuning. Fine-tuning may take shorter steps for updates, such as 2K~5K which also depends on your task and data set.

piedralaves commented 2 years ago

Thanks a lot for the second suggestion.

I am working in anycase in the solution of the integration of pretrained word embeddings. The fact is that we are researchers from a university and we want to monitor pretrained word embeddings as well. It is a good choice to have the two posibilities.

At this point, we have integrated yet the function that load the embedding and the vocabularies, and it works. It load the properly and no error arise. But in the parallel for of the train, an error arise and we don´t know it is a parametrization issue in the json config:

err,22/06/2022 21:04:18 Exception: 'Output tensor must have the same number of elements as the input. Size = 56 450 , New Size = 8 7 3 4 37 ' err,22/06/2022 21:04:18 Call stack: ' at TensorSharp.Tensor.View(Int64[] sizes) in C:\Users\jorge\source\repos\Seq2SeqSharp-RELEASE_2_5_0\TensorSharp\Tensor.cs:line 298 at Seq2SeqSharp.Tools.ComputeGraphTensor.View(IWeightTensor w, Int64[] dims) in C:\Users\jorge\source\repos\Seq2SeqSharp-RELEASE_2_5_0\Seq2SeqSharp\Tools\ComputeGraphTensor.cs:line 1610 at Seq2SeqSharp.MultiHeadAttention.Perform(IWeightTensor inputQ, IWeightTensor keyMask, Int32 batchSize, IComputeGraph graph, Boolean outputAttenWeights) in C:\Users\jorge\source\repos\Seq2SeqSharp-RELEASE_2_5_0\Seq2SeqSharp\Layers\MultiHeadAttention.cs:line 91 at Seq2SeqSharp.TransformerEncoder.Encode(IWeightTensor inputs, Int32 batchSize, IComputeGraph g, IWeightTensor srcSelfMask) in C:\Users\jorge\source\repos\Seq2SeqSharp-RELEASE_2_5_0\Seq2SeqSharp\Networks\TransformerEncoder.cs:line 89 at Seq2SeqSharp.Applications.Encoder.RunEncoder(IComputeGraph g, List1 seqs, IEncoder encoder, IModel modelMetaData, IWeightTensor embeddings, IWeightTensor selfMask, IWeightTensor posEmbeddings, IWeightTensor segmentEmbeddings) in C:\Users\jorge\source\repos\Seq2SeqSharp-RELEASE_2_5_0\Seq2SeqSharp\Applications\Encoder.cs:line 130 at Seq2SeqSharp.Applications.Encoder.InnerRunner(IComputeGraph computeGraph, List1 srcTokensList, Single[] originalSrcLengths, ShuffleEnums shuffleType, IEncoder encoder, IModel modelMetaData, IWeightTensor srcEmbedding, IWeightTensor posEmbedding, IWeightTensor segmentEmbedding) in C:\Users\jorge\source\repos\Seq2SeqSharp-RELEASE_2_5_0\Seq2SeqSharp\Applications\Encoder.cs:line 100 at Seq2SeqSharp.Applications.Encoder.Run(IComputeGraph computeGraph, ISntPairBatch sntPairBatch, IEncoder encoder, IModel modelMetaData, ShuffleEnums shuffleType, IWeightTensor srcEmbedding, IWeightTensor posEmbedding, IWeightTensor segmentEmbedding, List1 srcSntsIds, Single[] originalSrcLengths) in C:\Users\jorge\source\repos\Seq2SeqSharp-RELEASE_2_5_0\Seq2SeqSharp\Applications\Encoder.cs:line 69 at Seq2SeqSharp.Applications.SeqClassification.RunForwardOnSingleDevice(IComputeGraph computeGraph, ISntPairBatch sntPairBatch, Int32 deviceIdIdx, Boolean isTraining, DecodingOptions decodingOptions) in C:\Users\jorge\source\repos\Seq2SeqSharp-RELEASE_2_5_0\Seq2SeqSharp\Applications\SeqClassification.cs:line 214 at Seq2SeqSharp.Tools.BaseSeq2SeqFramework1.<>c__DisplayClass39_0.b__0(Int32 i) in C:\Users\jorge\source\repos\Seq2SeqSharp-RELEASE_2_5_0\Seq2SeqSharp\Tools\BaseSeq2SeqFramework.cs:line 562'

Could I send you a link with the project to your mail. It is a good thing for you restaurate the pretrained txt2vec option? May be some other people want to use pretrained embeddings too.

Thanks a lot

zhongkaifu commented 2 years ago

The exception is from this line:

var weightedQKV = g.View(g.Affine(inputQNorm, QKV, QKVb), dims: new long[] { batchSize, seqLenQ, 3, m_multiHeadNum, m_d });

It seems m_multiHeadNum * m_d is not equal to your weights dim. Can you please share your config file here ?

piedralaves commented 2 years ago

`{ "Task":"Train", "HiddenSize":150, "EmbeddingDim": 150, "SrcVocabSize": 17883, "TgtVocabSize": 2, "IsEmbeddingTrainable": false, "IsEncoderTrainable": false, "StartLearningRate":0.0006, "WeightsUpdateCount":0, "EncoderLayerDepth":2, "DecoderLayerDepth":2, "SharedEmbeddings":false, "EnableTagEmbeddings":false, "TgtVocab":"C:/Users/jorge/source/repos/Seq2SeqSharp-RELEASE_2_5_0/ReleasePackage/data/custom/SeqClassification/clasVocabulary.txt", "SrcVocab":"C:/Users/jorge/source/repos/Seq2SeqSharp-RELEASE_2_5_0/ReleasePackage/data/custom/SeqClassification/vocabulary.txt", "SrcEmbeddingFilePath":"C:/Users/jorge/source/repos/Seq2SeqSharp-RELEASE_2_5_0/ReleasePackage/data/custom/SeqClassification/model.bin", "SrcEmbeddingModelFilePath":null, "TgtEmbeddingModelFilePath":null, "ModelFilePath":"C:/Users/jorge/source/repos/Seq2SeqSharp-RELEASE_2_5_0/ReleasePackage/data/custom/SeqClassification/embmodel.model", "TrainCorpusPath":"C:/Users/jorge/source/repos/Seq2SeqSharp-RELEASE_2_5_0/ReleasePackage/data/custom/SeqClassification/train", "ValidCorpusPaths":"C:/Users/jorge/source/repos/Seq2SeqSharp-RELEASE_2_5_0/ReleasePackage/data/custom/SeqClassification/test", "SrcLang":"SAM", "TgtLang":"CLA", "InputTestFile":null, "OutputTestFile":null, "ShuffleType":"NoPadding", "ShuffleBlockSize":-1, "GradClip":5.0, "BatchSize":100, "ValBatchSize":1, "DropoutRatio":0.0, "ProcessorType":"CPU", "EncoderType":"Transformer", "DecoderType":"Transformer", "MultiHeadNum":4, "DeviceIds":"0", "BeamSearchSize":1, "MaxEpochNum":100, "MaxTrainSentLength":110, "MaxTestSentLength":110, "WarmUpSteps":8000, "VisualizeNNFilePath":null, "Beta1":0.9, "Beta2":0.98, "ValidIntervalHours":1.0, "EnableCoverageModel":false, "CompilerOptions":"--use_fast_math --gpu-architecture=compute_60", "Optimizer":"Adam"

}`

zhongkaifu commented 2 years ago

Try to modify your HiddenSize and EmbeddingDim to 148, or change MultiHeadNum to 5. Because HiddenSize and EmbeddingDim must be divided by MultiHeadNum with no remainder.

piedralaves commented 2 years ago

My pretrained txt2vec models has 150D so it is better for me to turn MultiHeadNum to 5. It seems to work now.

In any case, I will evaluate it and let you know.

Thanks a lot

piedralaves commented 2 years ago

If pretrained txt2vec models is used, shoud I set the parameter "IsEmbeddingTrainable" to false, and "IsEncoderTrainable" also to false?

zhongkaifu commented 2 years ago

I would suggest you set it to true, then they can be updated during fine-tuning.

piedralaves commented 2 years ago

Both? even "IsEmbeddingTrainable"?

zhongkaifu commented 2 years ago

Yes.

piedralaves commented 2 years ago

Thanks a lot.

zhongkaifu / Seq2SeqSharp

Which are the right parameters in the configuration file if you want to use pretrained embbeddings? #50