mesolitica / malaya

Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/
MIT License
469 stars 127 forks source link

This TfidfTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator #209

Open khoi01 opened 4 months ago

khoi01 commented 4 months ago

Hello sir,I have issues using malaya package.

when i try to used model = malaya.emotion.multinomial() it show an error message below.I have time several code but still have same issues.I don't know what mistake that i have make. hope can help me solve this problem thank you in advance sir.

Example 1

model = malaya.emotion.xlnet()
    anger_text = 'babi la company ni, aku dah la penat datang dari jauh'
    result = model.predict([anger_text])
    return {"result": result}

Example 2

model = malaya.emotion.bert()
    anger_text = 'babi la company ni, aku dah la penat datang dari jauh'
    result = model.predict([anger_text])
    return {"result": result}

Error Message

sentiment-analysis-app  |  * Running on all addresses (0.0.0.0)
sentiment-analysis-app  |  * Running on http://127.0.0.1:5000
sentiment-analysis-app  |  * Running on http://172.22.0.2:5000
sentiment-analysis-app  | Press CTRL+C to quit
sentiment-analysis-app  | /usr/local/lib/python3.9/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator ComplementNB from version 0.22.1 when using version 1.5.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
sentiment-analysis-app  | https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
sentiment-analysis-app  |   warnings.warn(
sentiment-analysis-app  | /usr/local/lib/python3.9/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator TfidfTransformer from version 0.22.1 when using version 1.5.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
sentiment-analysis-app  | https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
sentiment-analysis-app  |   warnings.warn(
sentiment-analysis-app  | /usr/local/lib/python3.9/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator TfidfVectorizer from version 0.22.1 when using version 1.5.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
sentiment-analysis-app  | https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
sentiment-analysis-app  |   warnings.warn(
sentiment-analysis-app  | [2024-07-11 01:50:13,255] ERROR in app: Exception on /api/test2 [GET]
sentiment-analysis-app  | Traceback (most recent call last):
sentiment-analysis-app  |   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1473, in wsgi_app
sentiment-analysis-app  |     response = self.full_dispatch_request()
sentiment-analysis-app  |   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 882, in full_dispatch_request
sentiment-analysis-app  |     rv = self.handle_user_exception(e)
sentiment-analysis-app  |   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 880, in full_dispatch_request
sentiment-analysis-app  |     rv = self.dispatch_request()
sentiment-analysis-app  |   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 865, in dispatch_request
sentiment-analysis-app  |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
sentiment-analysis-app  |   File "/app/app.py", line 74, in test2
sentiment-analysis-app  |     result = model.predict([anger_text])
sentiment-analysis-app  |   File "/usr/local/lib/python3.9/site-packages/malaya/model/ml.py", line 134, in predict
sentiment-analysis-app  |     return self._predict(strings=strings)
sentiment-analysis-app  |   File "/usr/local/lib/python3.9/site-packages/malaya/model/ml.py", line 35, in _predict
sentiment-analysis-app  |     results = self._classify(strings)
sentiment-analysis-app  |   File "/usr/local/lib/python3.9/site-packages/malaya/model/ml.py", line 31, in _classify
sentiment-analysis-app  |     vectors = self._vectorize.transform(subs)
sentiment-analysis-app  |   File "/usr/local/lib/python3.9/site-packages/sklearn/feature_extraction/text.py", line 2116, in transform
sentiment-analysis-app  |     return self._tfidf.transform(X, copy=False)
sentiment-analysis-app  |   File "/usr/local/lib/python3.9/site-packages/sklearn/feature_extraction/text.py", line 1687, in transform
sentiment-analysis-app  |     check_is_fitted(self)
sentiment-analysis-app  |   File "/usr/local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 1661, in check_is_fitted
sentiment-analysis-app  |     raise NotFittedError(msg % {"name": type(estimator).__name__})
sentiment-analysis-app  | sklearn.exceptions.NotFittedError: This TfidfTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
sentiment-analysis-app  | 192.168.65.1 - - [11/Jul/2024 01:50:13] "GET /api/test2 HTTP/1.1" 500 -

below is my setup environment using docker.

docker-compose.yml

version: '3.8'

services:
  app:
    build: .
    container_name: sentiment-analysis-app
    volumes:
      - .:/app
    ports:
      - "8000:5000"
    environment:
      - PYTHONDONTWRITEBYTECODE=1
      - PYTHONUNBUFFERED=1
    command: ["python3", "app.py"]

Dockerfile

# Use the official Python image from the Docker Hub
FROM python:3.9-slim

# Set environment variables to avoid Python from writing .pyc files
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# Install pip3 and necessary packages
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        build-essential \
        python3-pip \
        python3-dev \
        git \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Install Cython before other requirements
RUN pip3 install --upgrade pip \
    && pip3 install Cython

# Set the working directory
WORKDIR /app

# Copy requirements.txt and install dependencies
COPY requirements.txt /app/
RUN pip3 install --default-timeout=1000 -r requirements.txt

# Copy the current directory contents into the container at /app
COPY . /app

# Run the application
CMD ["python3", "app.py"]

requirements.txt

malaya==5.1.1
torch>=1.10
Flask==3.0.3
beautifulsoup4==4.12.3
youtokentome==1.0.6
malaya-boilerplate==0.0.25
scikit-learn>=1.2
dateparser==1.2.0
requests==2.32.3
unidecode==1.3.8
numpy==1.26.4
scipy==1.13.1
ftfy==6.2.0
networkx==3.2.1
sentencepiece==0.2.0
tqdm==4.66.4
transformers==4.42.3

app.py

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

from sqlite3 import Time
import malaya
from flask import Flask, request

app = Flask(__name__)

@app.route('/api/test', methods=['GET'])
def test():
    return "Flask server is running!"

@app.route('/api/test2', methods=['GET'])
def test2():
    import malaya
    model = malaya.emotion.multinomial()
    anger_text = 'babi la company ni, aku dah la penat datang dari jauh' 
    result = model.predict([anger_text])
    return result

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=5000)

api.http

### Test the Flask server is running
GET http://localhost:8000/api/test2
Accept: application/json

###
huseinzol05 commented 4 months ago

Want try to PR this?

khoi01 commented 4 months ago

Want try to PR this?

meaning sir?

huseinzol05 commented 4 months ago

fork and make a pull request to fix it