ghost commented 4 years ago

Hi,

Hope you are doing all well !

I have these 2 interesting projects for sentiment analysis in french.

And I d like to create them as a service as you did for camenBERT. My speciality is golang and not really python, so that's why I request some help from you.

For the second one, I know that code can process one piece of content.

import numpy as np
import tensorflow as tf
assert tf.__version__ >= "2.0" 

from transformers import CamembertTokenizer, TFCamembertForSequenceClassification

# Preprocessing 
def encode_reviews(tokenizer, reviews, max_length):
    token_ids = np.zeros(shape=(len(reviews), max_length),
                         dtype=np.int32)
    for i, review in enumerate(reviews):
        encoded = tokenizer.encode(review, max_length=max_length)
        token_ids[i, 0:len(encoded)] = encoded
    attention_mask = (token_ids != 0).astype(np.int32)
    return {"input_ids": token_ids, "attention_mask": attention_mask}

# Load model 
MODEL_FOLDER = "camembert_sentiment" # Local model
tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
model = TFCamembertForSequenceClassification.from_pretrained(MODEL_FOLDER)

# Inference
MAX_SEQ_LEN = 400
text = "Ce film était génial !"
X = encode_reviews(tokenizer, [text], MAX_SEQ_LEN)
scores = model.predict(X)
y_pred = np.argmax(scores[0], axis=1)
# y_pred = 0 if negative, 1 if positive
# here, y_pred shoud be 1

How to load the model in shared way and have a rest api service ?

Thanks for any inputs or insights.

Cheers, X

jqueguiner commented 4 years ago

Hi @x0rzkov, happy to help

is there a repo you want me to contribute to ?
can you provide the local_model you trained please.
From what I see you are using a pretrained tokenizer and not a custom one can you confirm ?

ghost commented 4 years ago

Hi,

Thanks for your reply

1. french-sentiment-analysis-with-bert

Here is a gist with what I have done so far for french-sentiment-analysis-with-bert :

https://gist.github.com/x0rzkov/111f8081c30c5ed82268bbca30729072

It is a providing a shell script for downloading the local domain, and a rest api (largely inspired by your work, so thanks a lot).

For the question 3, we are using camenBert base.

My other question would be how to make it this rest api scalable like with k8s/docker with rabbitmq ?

I am watching/trying to inspire myself from these solutions for scaling:

Goals:

Create a scalable/deployable API
Create a CPU/GPU handler
Docker alpine based container

2. bert-tweets-analysis

I have done some research and basically nothing on it https://github.com/OthSay/bert-tweets-analysis.

Here is the to do list:

### TODO : 
- [ ] Optimize prediction phase.
- [ ] Finalize API, and make a demo webpage. 
- [ ] Detect tweet language automatically.
- [ ] Finetune Camembert model for french sentiment analysis.
- [ ] Add Named Entity Recognition model.
- [ ] Add sentiment Discovery (by [NVIDIA](https://github.com/NVIDIA/sentiment-discovery))

For the sentiment discovery from NVIDIA, I found this repo https://github.com/Rexhaif/nvidia-eval

It is providing a rest api and cli script.

git clone https://github.com/rexhaif/nvidia-eval.git
cd nvidia-eval
cd models 
bash get_models.sh
cd ../
pip install -r requirements.txt
python eval.py --example "i love you" # cli eval
python app.py                         # rest api

Unfortunately, it seems to be another approach for training and finetuning Language Models. However, they are using their own models, as no references to BERT. Moreover, they do share they pre-trained weights, but they are only in English language. So, no chance to use camemBERT or fluaBERT with sentiment-discovery.

Goal(s):

Analysis of french tweets and finetuning Camembert model for french sentiment analysis.
Create a CPU/GPU handler
Docker alpine based container

Hope that was clear enough as an overview.

Cheers, X

jqueguiner / camembert-as-a-service

help to make 2 project as a service like yours #1

1. french-sentiment-analysis-with-bert

Goals:

2. bert-tweets-analysis

Goal(s):