Closed clm33 closed 1 year ago
Hi @clm33,
If you check file "Seq2SeqSharp\Tools\SeqClassificationConsole\Program.cs" and then you will find I commented out this code for "Valid" test. I forget why I commented these code out, but you can try to add it back, rebuild the project, and run "Valid" test. Let me know if you get any problem on it.
In addition, just a suggestion, for metrics, you may choose the proper metrics for your task, such as "SequenceLabelFscoreMetric" for F1 score or implement new metrics for your task.
Thanks Zhongkai Fu
Thanks for answering. I will try to sort it out and let you know.
okay, I just updated the code and add "Valid" back to SeqClassification Tool. However, since I don't have trained SeqClassification model for now, I didn't test it.
You can either 1) pull the latest code from repo, or 2) copy and paste that code I modified to your project and run it. Let me know if you have any problem.
Thanks Zhongkai Fu
Dear Zhongkai:
is the "opts.ShuffleBlockSize" the same as "opts.ValMaxTokenSizePerBatch" in your new "valid" code?
Thanks a lot
@piedralaves
In the latest code, Seq2SeqSharp does not have "opts.ShuffleBlockSize" anymore, because I use a new way to shuffle data set without any memory limitation problem.
For "opts.ValMaxTokenSizePerBatch", it means during validation, how many tokens at most can be sent to network in a mini-batch.
Thanks Zhongkai Fu
And, how many tokens do you recommend in opts.ValMaxTokenSizePerBatch?
We are trying to reconstruct the "valid" stament.
Thanks
May be 5120?
It depends on your GPU memory size. You could set it to 5120 and try it. If you got OOM problem, then reduce it to a smaller value.
For "valid", this value only affects speed performance and no impact on quality, so it's safe to try it using different values.
Thanks Zhongkai Fu
This is our code:
` else if (opts.Task == ModeEnums.Valid) { Logger.WriteLine($"Evaluate model '{opts.ModelFilePath}' by valid corpus '{opts.ValidCorpusPaths}'");
// Create metrics
ss = new SeqClassification(opts);
Dictionary<int, List<IMetric>> taskId2metrics = new Dictionary<int, List<IMetric>>();
for (int i = 0; i < ss.ClsVocabs.Count; i++)
{
taskId2metrics.Add(i, new List<IMetric>());
taskId2metrics[i].Add(new MultiLabelsFscoreMetric("", ss.ClsVocabs[i].GetAllTokens(keepBuildInTokens: false)));
}
ss = new SeqClassification(opts);
ss.EvaluationWatcher += Ss_EvaluationWatcher;
// Load valid corpus
//Seq2SeqCorpus validCorpus = new Seq2SeqCorpus(opts.ValidCorpusPaths, opts.SrcLang, opts.TgtLang, opts.ValBatchSize, opts.ShuffleBlockSize, opts.MaxTestSentLength, opts.MaxTestSentLength, shuffleEnums: opts.ShuffleType, tooLongSequence: opts.TooLongSequence);
var validCorpus = new SeqClassificationMultiTasksCorpus(opts.ValidCorpusPaths, srcLangName: opts.SrcLang, tgtLangName: opts.TgtLang, opts.MaxTokenSizePerBatch, opts.MaxTestSentLength, shuffleEnums: opts.ShuffleType, tooLongSequence: opts.TooLongSequence);
ss.Valid(validCorpus, taskId2metrics, null);
}`
Changing the value, the problem is that sometimes an OOM error arises and sometimes no error arises but no result for F: info,05/02/2023 11:15:45 Metrics result on task '0' on data set 'valid': MultiLabelsFscore_ = The number of categories = '0'
Our logs:
SeqClassificationConsole_Valid_2023_02_05_11h_06m_01s.log
Is it something wrong with our code?
Thanks a lot
@piedralaves What your data format looks like ? Can you please share a few examples of it ?
Thanks Zhongkai Fu
descriptions.cla.snt.txt descriptions.sam.snt.txt
The format is one utterance and one cateogry per row. For introducing them into the RNN they must be in different files, so I upload a .txt with same examples so you can see. The file names were added the ".txt" so that they could be uploaded. When introducing them into the RNN the names are descriptions.cla.snt for the categories and descriptions.sam.snt for the utterances
Here is the latest code I checked-in for SeqClassifcation validation. You can pull it to your project. Note that since SeqClassifcation supports multi-tasks, it uses "ValidCorpusPaths" in config file rather than "ValidCorpusPath" and each task has separated validation file. Can you please check if your config file is correct ?
else if (opts.Task == ModeEnums.Valid)
{
Logger.WriteLine($"Evaluate model '{opts.ModelFilePath}' by valid corpus '{opts.ValidCorpusPaths}'");
// Create metrics
ss = new SeqClassification(opts);
Dictionary<int, List<IMetric>> taskId2metrics = new Dictionary<int, List<IMetric>>();
for (int i = 0; i < ss.ClsVocabs.Count; i++)
{
taskId2metrics.Add(i, new List<IMetric>());
taskId2metrics[i].Add(new MultiLabelsFscoreMetric("", ss.ClsVocabs[i].GetAllTokens(keepBuildInTokens: false)));
}
ss = new SeqClassification(opts);
ss.EvaluationWatcher += Ss_EvaluationWatcher;
// Load valid corpus
if (!opts.ValidCorpusPaths.IsNullOrEmpty())
{
string[] validCorpusPathList = opts.ValidCorpusPaths.Split(';');
foreach (var validCorpusPath in validCorpusPathList)
{
Logger.WriteLine($"Loading valid corpus '{validCorpusPath}'");
var validCorpus = new SeqClassificationMultiTasksCorpus(validCorpusPath, srcLangName: opts.SrcLang, tgtLangName: opts.TgtLang, opts.ValMaxTokenSizePerBatch, opts.MaxSentLength, shuffleEnums: opts.ShuffleType, tooLongSequence: opts.TooLongSequence);
Logger.WriteLine($"Validating corpus '{validCorpusPath}'");
ss.Valid(validCorpus, taskId2metrics, null);
}
}
}
Yes, we are using the "ValidCorpusPaths" parameter in the configuration file.
Hi Zhongkai:
We are revising our code. One of the posible problem is that when the constructor of SeqClassification load a pretrained model from a file, that is:
ss = new SeqClassification(opts);
Items List were not being loaded and only IndexToWord and WordToIndex are full.
This issue was affecting to the incremental as well to the valid.
We find a posible solution just loading again the Items when SeqClassification(opts) is called:
if (File.Exists(m_options.ModelFilePath))
{
if (srcVocab != null || clsVocabs != null)
{
throw new ArgumentException($"Model '{m_options.ModelFilePath}' exists and it includes vocabulary, so input vocabulary must be null.");
}
m_modelMetaData = LoadModelImpl_WITH_CONVERT(CreateTrainableParameters);
//m_modelMetaData = LoadModelImpl();
//---LoadModel_As_BinaryFormatter( CreateTrainableParameters );
//loading the items again
for (int n = 0; n < this.ClsVocabs.Count; n++)
{
for (int k = 0; k < this.ClsVocabs[n].IndexToWord.Count; k++)
{
this.ClsVocabs[n].Items.Add(this.ClsVocabs[n].IndexToWord[k]);
}
}
// //loading the items again
for (int k = 0; k < this.SrcVocab.IndexToWord.Count; k++)
{
this.SrcVocab.Items.Add(this.SrcVocab.IndexToWord[k]);
}
}
It is posible that this issue were only in our code and not in yours. Sorry about that. Remember that we manipulated your code to load text2vec embeddings again for research reasons.
Anycase, we apreciate yor help very much and follow our revision.
Thanks a lot
@piedralaves
It looks good to me. Let me know if you have any further questions.
In addition, are you using the latest code from the repo ? If not, just curious, why?
Thanks Zhongkai Fu
No, not the last one. We are using a recent version in which we made available again the functionality of loading text2vec embeddings. We are also researching about classical embeddings and we need such functionality.
https://github.com/zhongkaifu/Seq2SeqSharp/issues/50
For this reason, your advises are very valuable.
Dear Zhongkaifu:
I am trying to validate a model trained on a sequence-classification task, but when trying to execute the program, the following error appears:
"Task 'Valid' is not supported"
In the Usage: SeqClassificationConsole [parameters...] section that pops out after the error, 'Valid' appears as a proper value to the "Task" parameter so I do not understand why it does not work.
I hope you can shed some light on my issue.