zhongkaifu / Seq2SeqSharp

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.
Other
193 stars 38 forks source link

GPTconsole #62

Closed piedralaves closed 1 year ago

piedralaves commented 1 year ago

Hi Zhongkai,

Defintetively, we are using your last release just putting some code lines in order to load embbedding from plain text. But, anyway, using your last version. All apps work properly in CPU, GPU and with MKL.

Nonnetheless, we are trying to work with your new GPTConsole to make an autorregresive (and in a self-supervised way) model and generate embeddings. The problem is that it seems not to work. Please, can you help us to make it work for teh first time? May be is something in our config...

These are our config and log:

config.txt Seq2SeqConsole_Train_2023_02_17_13h_22m_56s.log

Thanks a lot

zhongkaifu commented 1 year ago

Hi @piedralaves You need to set "DecoderType":"GPTDecoder" in config file, because GPT is decoder only model.

Thanks Zhongkai Fu

piedralaves commented 1 year ago

Thanks a lot. It works.

Just a question. Do you think is it a good strategy to generate the embeddings with GPTconsole, that is, with no task to resolve. I mean, only in an autoregressive way.

If I understand well, GPTConsole is an autoregressive Language Model (LM) in which no supervised task are required (no classification, no sequence to generate in decoder side). Thus, embeddings generated with GPT are no subjected to a concrete task, that is, GPT task is language general use of the sentences of the sample.

For this reason, embeddings from GPT are the purest form of representations of words and can serve to initialize embeddings of other supervised task or sequence generation tasks (seq to seq).

Am I right?

Thanks a lot.

zhongkaifu commented 1 year ago

Unsupervised task, such as GPT model or autoregressive LM, doesn't mean "no task". It still has a task which is to predict the next output token. Unsupervised task means no requirement for human to provide ground truth results for model training. So, you can use GPTConsole for embeddings.

Thanks Zhongkai Fu

piedralaves commented 1 year ago

Yes, yes. Thats what I meant. That the task of GPT is to predict the next word in a sequence in an autoregressive way, but without a supervised goal. Thus, representations formed in the embeddings has no adherence to a supervised task. They come from sequence context use only.

Thanks again