skeskinen / bert.cpp

ggml implementation of BERT
MIT License
463 stars 58 forks source link

Classification Pipeline #11

Closed okpatil4u closed 1 year ago

okpatil4u commented 1 year ago

Is it possible to retrofit BERT classification model into this code ?

Can you please provide me some guidelines so that I can take care of it myself ?

Thanks in advance.

skeskinen commented 1 year ago

Hi, classification should be just another linear layer at the end when the embeddings are done.

In the model conversion script, replace the automodel with BertForSequenceClassification or whichever you are using. During this step, check the name of the classification layer and modify the c++ to load and rename that layer.

Then in the model eval code, take out the mean pooling and embeddings normalization and add multiplication with the classification matrix. I think for classification you take the embeddings of the first token and discard everything else.

okpatil4u commented 1 year ago

Thanks. Apparently Bert uses tanh for for pooler. In your code you have defined pooler as following

        // pooler
        struct ggml_tensor *sum = ggml_new_tensor_2d(ctx0, GGML_TYPE_F32, N, 1);
        ggml_set_f32(sum, 1.0f / N);
        inpL = ggml_mul_mat(ctx0, inpL, sum);

        // normalizer
        ggml_tensor *length = ggml_sqrt(ctx0,
                                        ggml_sum(ctx0, ggml_sqr(ctx0, inpL)));
        inpL = ggml_scale(ctx0, inpL, ggml_div(ctx0, ggml_new_f32(ctx0, 1.0f), length));

        ggml_tensor *output = inpL;

This may make sense in case if you need to calculate just the embeddings. But in my case, there is a tanh activation after the pooler. I have to load the pooler, apply tanh, apply dropout and then apply classification layer. This is how my code looks like.


       // pooler
        inpL = ggml_mul_mat(ctx0, model.pooler_w, inpL);
        inpL = ggml_add(ctx0, ggml_repeat(ctx0, model.pooler_b, inpL), inpL);
        // Apply Tanh activation function
        inpL = ggml_tanh(ctx0, inpL);

        // Apply dropout
        inpL = ggml_dropout(ctx0, inpL, model.dropout);

        // Adding the classifier layer here
        inpL = ggml_mul_mat(ctx0, model.classifier_w, inpL);
        inpL = ggml_add(ctx0, ggml_repeat(ctx0, model.classifier_b, inpL), inpL);

Is this correct ? Apparently there are no ggml_tanh and ggml_dropout functions in the ggml library. Do I have to implement them from scratch ? Also, I didn't find dropout anywhere in the code, although it has been mentioned in the model specification. Is it calculated implicitly ?

skeskinen commented 1 year ago

Dropout is only used during training and you can just ignore it when implementing inference.

Tanh is indeed missing from ggml. You could either try to implement it as an operator in ggml or use ggml_map_unary_f32 to wrap tanh into a tensor function.

okpatil4u commented 1 year ago

Thanks. This is very useful.

Just as a context, I am trying to convert prajjwal1/bert-tiny. It has a layer "bert.embeddings.position_ids". After reading this layer and before reading "bert.embeddings.word_embeddings.weight", the following line is giving segmentation fault.

            int64_t nelements = 1;
            int64_t ne[2] = {1, 1};
            for (int i = 0; i < n_dims; ++i)
            {
                int32_t ne_cur;
                fin.read(reinterpret_cast<char *>(&ne_cur), sizeof(ne_cur));
                ne[i] = ne_cur;
                nelements *= ne[i];
            }

Am I supposed to load position_ids in a some different way ? Can you please give this model a try by any chance ?

skeskinen commented 1 year ago

position_ids is just a vector with indices 0,1,2,3... In bert.cpp, it's here: https://github.com/skeskinen/bert.cpp/blob/master/bert.cpp#L791

For a CPU impelementation it doesn't really make sense to store it as parameter to the model. It's done in pytorch like that for GPU memory related reasons. But it's just a for loop.

okpatil4u commented 1 year ago

Sure. But if I skip them, I get following error.

bert_load_from_file: unknown tensor 'bert.embeddings.position_ids' in model file

If I don't, then I get segmentation fault. Am I missing something ?

skeskinen commented 1 year ago

While creating the model file, don't store the tensors you don't load. Or modify the code to ignore the error.

okpatil4u commented 1 year ago

Got it. You had skipped them using "embeddings.position_ids". I had to change it to "bert.embeddings.position_ids" in convert ggml file.

Can I skip the pooler weights as well ? As you had mentioned in your first reply ? Or do I have to implement them including tanh ?

skeskinen commented 1 year ago

The pooler weights are skipped in bert.cpp, because sbert.net doesn't use the vanilla BERT pooler. For classification, I think, you should use the vanilla pooler so that includes the learned pooler weights and the tanh step.

okpatil4u commented 1 year ago

Thanks man !

I was able to get it working by temporarily using relu instead of tanh.

        inpL = ggml_mul_mat(ctx0, model.pooler_w, inpL);
        inpL = ggml_add(ctx0, ggml_repeat(ctx0, model.pooler_b, inpL), inpL);
        // Apply Tanh activation function
        inpL = ggml_relu(ctx0, inpL);

        inpL = ggml_mul_mat(ctx0, model.classifier_w, inpL);
        inpL = ggml_add(ctx0, ggml_repeat(ctx0, model.classifier_b, inpL), inpL);

        ggml_tensor *output = inpL;

        // run the computation
        ggml_build_forward_expand(&gf, output);
        ggml_graph_compute(ctx0, &gf);

Does it seem like a correct code ? At least it is working with your server and client implementation. I will implement tanh later, after your comments.

skeskinen commented 1 year ago

The code looks at least reasonable. It's probably close to being right. Compare the outputs with the original, I guess? Only thing I'd check is that afaik inpL in the beginning should be just the embeddings for the first token (CLS or BOS token). Maybe you do that already but it's not shown here. Or maybe I'm wrong, like I said, check the original code

okpatil4u commented 1 year ago

I am not sure. I replaced these lines with the code above.

        inpL = ggml_cont(ctx0, ggml_transpose(ctx0, inpL));
        // pooler
        struct ggml_tensor *sum = ggml_new_tensor_2d(ctx0, GGML_TYPE_F32, N, 1);
        ggml_set_f32(sum, 1.0f / N);
        inpL = ggml_mul_mat(ctx0, inpL, sum);

        // normalizer
        ggml_tensor *length = ggml_sqrt(ctx0,
                                        ggml_sum(ctx0, ggml_sqr(ctx0, inpL)));
        inpL = ggml_scale(ctx0, inpL, ggml_div(ctx0, ggml_new_f32(ctx0, 1.0f), length));

        ggml_tensor *output = inpL;
        // run the computation
        ggml_build_forward_expand(&gf, output);
        ggml_graph_compute(ctx0, &gf);

Is this correct ?

okpatil4u commented 1 year ago

It is giving correct results. Thank you for your help !

redthing1 commented 1 year ago

@okpatil4u I want to use classification models as well, are your changes public? I would like to work with them.

redthing1 commented 1 year ago

@okpatil4u pinging you again: I want to use classification models as well, are your changes public? I would like to work with them.

okpatil4u commented 1 year ago

The reason I didn’t make it public because it doesn’t work. There is at least 25% accuracy drop when compared with the same model in huggingface/transformers. We ended up rebuilding it in huggingface/candle from scratch. You are welcome to try it.

On Thu, 21 Sep 2023 at 7:49 AM, red thing @.***> wrote:

@okpatil4u https://github.com/okpatil4u pinging you again: I want to use classification models as well, are your changes public? I would like to work with them.

— Reply to this email directly, view it on GitHub https://github.com/skeskinen/bert.cpp/issues/11#issuecomment-1728667790, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXGU4FQ6OVKOXQVY6JCLADX3OP3HANCNFSM6AAAAAAYOSS3VI . You are receiving this because you were mentioned.Message ID: @.***>

redthing1 commented 1 year ago

Thank you for the update.

redthing1 commented 1 year ago

@okpatil4u

The reason I didn’t make it public because it doesn’t work. There is at least 25% accuracy drop when compared with the same model in huggingface/transformers. We ended up rebuilding it in huggingface/candle from scratch. You are welcome to try it. On Thu, 21 Sep 2023 at 7:49 AM, red thing @.> wrote: @okpatil4u https://github.com/okpatil4u pinging you again: I want to use classification models as well, are your changes public? I would like to work with them. — Reply to this email directly, view it on GitHub <#11 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXGU4FQ6OVKOXQVY6JCLADX3OP3HANCNFSM6AAAAAAYOSS3VI . You are receiving this because you were mentioned.Message ID: @.>

Would you mind sharing the source modifications you made? I would like to troubleshoot.