zhongkaifu / Seq2SeqSharp

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.
Other
193 stars 38 forks source link

SeqLabelConsole Sequence tag #30

Closed lijianxin520 closed 2 years ago

lijianxin520 commented 2 years ago

Hello, I am learning this framework of yours, and I did not find the problem I met in the help document. I hope you can help me. My training text format is as follows: 世界 n n 1_n 第 m m 1_a 八 m m -1_m 大 a a 1_n 奇迹 n n 1_v 出现 v v 0_Root

zhongkaifu commented 2 years ago

Hi @lijianxin520

What's problem/error/exception did you get when you ran SeqLabelConsole tool ? What's your question ?

Thanks Zhongkai Fu

lijianxin520 commented 2 years ago

Hello, I want to use SeqLabelConsole for Tag training, but the example provides only [Token] \t [Tag],But the format of my corpus is [Token] \t [Token] \t[Token] \t[Tag], what do I do?

zhongkaifu commented 2 years ago

There are several different solutions can resolve your question:

  1. You can try to combine tokens in these three column into a single column. For example: 世界_n_n 1_n 第_m_m 1_a 八_m_m -1_m 大_a_a 1_n 奇迹_n_n 1_v 出现_v_v 0_Root

  2. Seq2Seq, SeqClassification and others already support multi-column features, however, SeqLabel has not yet. You can update it to support it. Basically, Tokens (Features) in the same column can be considered as a group of feature, you can update SeqLabel to retrieve features from different groups and then send them to encoder(s).

I just found tokens in second column and third column are same. Why ? They look like postag, why these two columns are duplicated ?

Thanks Zhongkai Fu

lijianxin520 commented 2 years ago

Thank you very much for your help. I'll have a try. For the two columns you see the same characteristics, one is coarse-grained and the other is fine-grained