iamDecode / sklearn-pmml-model

A library to parse and convert PMML models into Scikit-learn estimators.
BSD 2-Clause "Simplified" License
77 stars 15 forks source link

logistic #35

Closed yyyooohao closed 2 years ago

yyyooohao commented 3 years ago

The logistic regression that I use, the linear model that I use, it says in the document that logistic regression is included, why does it show up when I predict PMML model does not contain RegressionModel.

iamDecode commented 3 years ago

Thanks for your interest in sklearn-pmml-model! In order for me to help you find the problem, it would be great if you can stick to the issue template. Without an extract of the PMML model you are trying to convert, it is difficult for me to help you.

That being said, based on the error I think the library you used to export the PMML model has converted the original RegressionModel to an equivalent GeneralRegressionModel. If this is the case, you should be able to generate predictions using PMMLRidgeClassifier for classification, or PMMLRidge for regression.

yyyooohao commented 3 years ago

Thanks for your interest in sklearn-pmml-model! In order for me to help you find the problem, it would be great if you can stick to the issue template. Without an extract of the PMML model you are trying to convert, it is difficult for me to help you.

That being said, based on the error I think the library you used to export the PMML model has converted the original RegressionModel to an equivalent GeneralRegressionModel. If this is the case, you should be able to generate predictions using PMMLRidgeClassifier for classification, or PMMLRidge for regression.

I used SVM to predict before, and I want to use logistic regression to classify, test the accuracy of the results, and use logistic regression prediction under the linear model. I mainly want to try logistic regression for classification.

yyyooohao commented 3 years ago

Thanks for your interest in sklearn-pmml-model! In order for me to help you find the problem, it would be great if you can stick to the issue template. Without an extract of the PMML model you are trying to convert, it is difficult for me to help you.

That being said, based on the error I think the library you used to export the PMML model has converted the original RegressionModel to an equivalent GeneralRegressionModel. If this is the case, you should be able to generate predictions using PMMLRidgeClassifier for classification, or PMMLRidge for regression.

Well, I can use THE SVM export to PMML to make predictions, but the logical classification prediction will report an error

iamDecode commented 3 years ago

I suppose you are using PMMLLogisticRegression to make 'logical classification' predictions? In my previous comment, I recommended to use PMMLRidgeClassifier instead. To do that, just replace "PMMLLogisticRegression" with "PMMLRidgeClassifier". I think that should work for you.

yyyooohao commented 3 years ago

I suppose you are using PMMLLogisticRegression to make 'logical classification' predictions? In my previous comment, I recommended to use PMMLRidgeClassifier instead. To do that, just replace "PMMLLogisticRegression" with "PMMLRidgeClassifier". I think that should work for you.

Should I change my training to RidgeClassifier, or is there a problem with data processing? SVM can be a good test,Exception: PMML model does not contain GeneralRegressionModel.

yyyooohao commented 3 years ago

I suppose you are using PMMLLogisticRegression to make 'logical classification' predictions? In my previous comment, I recommended to use PMMLRidgeClassifier instead. To do that, just replace "PMMLLogisticRegression" with "PMMLRidgeClassifier". I think that should work for you.

Why is it easier for me to predict with SVM, but harder for me to predict with logistic regression? Is there any other model that can do better classification

yyyooohao commented 3 years ago

I suppose you are using PMMLLogisticRegression to make 'logical classification' predictions? In my previous comment, I recommended to use PMMLRidgeClassifier instead. To do that, just replace "PMMLLogisticRegression" with "PMMLRidgeClassifier". I think that should work for you.

image If only classfier parameters can be predicted in PMMLPipeline, but the accuracy of the result is not high, the logistic regression parameters need to be adjusted to reach a certain precision value.

iamDecode commented 3 years ago

I am not entirely sure what your problem is. It would be helpful if you can provide a copy of the PMML file that you having problems with.

In your screenshot you show the method PMMLPipeline. Do note this method is not part of this library, but from sklearn2pmml instead. That library converts sklearn models into PMML, as opposed to sklearn-pmml-model creating a sklearn model from a PMML.

For me, PMMLLogisticRegression works just fine. Check out this simple example on how to use it along with sklearn2pmml:

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn_pmml_model.linear_model import PMMLLogisticRegression
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn2pmml import sklearn2pmml

# Prepare data
iris = load_iris()
X = pd.DataFrame(iris.data)
X.columns = np.array(iris.feature_names)
y = pd.Series(np.array(iris.target_names)[iris.target])
y.name = "Class"

# train logistic regression
clf = LogisticRegression()
pipeline = PMMLPipeline([
    ("classifier", clf)
])
pipeline.fit(X, y)

# convert to PMML
sklearn2pmml(pipeline, "test.pmml", with_repr = True)

# Load from PMML and predict
clf = PMMLLogisticRegression(pmml="test.pmml")
clf.predict(X)
clf.score(X, y)
yyyooohao commented 3 years ago

I am not entirely sure what your problem is. It would be helpful if you can provide a copy of the PMML file that you having problems with.

In your screenshot you show the method PMMLPipeline. Do note this method is not part of this library, but from sklearn2pmml instead. That library converts sklearn models into PMML, as opposed to sklearn-pmml-model creating a sklearn model from a PMML.

For me, PMMLLogisticRegression works just fine. Check out this simple example on how to use it along with sklearn2pmml:


from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn_pmml_model.linear_model import PMMLLogisticRegression
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn2pmml import sklearn2pmml

# Prepare data
iris = load_iris()
X = pd.DataFrame(iris.data)
X.columns = np.array(iris.feature_names)
y = pd.Series(np.array(iris.target_names)[iris.target])
y.name = "Class"

# train logistic regression
clf = LogisticRegression()
pipeline = PMMLPipeline([
    ("classifier", clf)
])
pipeline.fit(X, y)

# convert to PMML
sklearn2pmml(pipeline, "test.pmml", with_repr = True)

# Load from PMML and predict
clf = PMMLLogisticRegression(pmml="test.pmml")
clf.predict(X)
clf.score(X, y)
![image](https://user-images.githubusercontent.com/65326115/131059977-241fc793-70d7-4b5a-bb90-0a254779fd46.png)
Logistic regression can be used, but it's not very accurate, only 40% accurate.Are there other networks that do categorization? 
iamDecode commented 3 years ago

The parameters you show don't make a lot of sense to me. max_iter = 2 is way too low to yield any decent classification. I suggest you start with LogisticRegression(), so without any arguments. See if that works (it should), and then gradually add arguments to see if it improves performance. Often enough, the default parameters prove to be sufficient.

If you like to try another model, I suggest trying RandomForestClassifier.

yyyooohao commented 3 years ago

The parameters you show don't make a lot of sense to me. max_iter = 2 is way too low to yield any decent classification. I suggest you start with LogisticRegression(), so without any arguments. See if that works (it should), and then gradually add arguments to see if it improves performance. Often enough, the default parameters prove to be sufficient.

If you like to try another model, I suggest trying RandomForestClassifier.

The test accuracy of default parameters is not high, which can only reach half of SVM, and it needs to be adjusted, and it does not need too complex network model.

yyyooohao commented 3 years ago

The parameters you show don't make a lot of sense to me. max_iter = 2 is way too low to yield any decent classification. I suggest you start with LogisticRegression(), so without any arguments. See if that works (it should), and then gradually add arguments to see if it improves performance. Often enough, the default parameters prove to be sufficient.

If you like to try another model, I suggest trying RandomForestClassifier.

I tried the random forest,ModuleNotFoundError: No module named 'sklearn_pmml_model.tree._tree'.I use three categories

iamDecode commented 3 years ago

Please make sure you installed the library using pip install sklearn-pmml-model. This error seems to indicate the Cython code is not compiled, which is only the case if you downloaded this library and are working in that directory directly.

If you, for some reason, cannot use pip, running the following command will compile the Cython code inplace, and should fix the issue you have:

python setup.py build_ext --inplace

I don't recommend this, and it will require a C compiler, which is a bit of a pain to setup on windows. More information about this process can be found at https://sklearn-pmml-model.readthedocs.io/en/latest/install.html#from-source.

yyyooohao commented 3 years ago

Please make sure you installed the library using pip install sklearn-pmml-model. This error seems to indicate the Cython code is not compiled, which is only the case if you downloaded this library and are working in that directory directly.

If you, for some reason, cannot use pip, running the following command will compile the Cython code inplace, and should fix the issue you have:

python setup.py build_ext --inplace

I don't recommend this, and it will require a C compiler, which is a bit of a pain to setup on windows. More information about this process can be found at https://sklearn-pmml-model.readthedocs.io/en/latest/install.html#from-source.

I installed the package according to Requerment.txt

yyyooohao commented 3 years ago

Please make sure you installed the library using pip install sklearn-pmml-model. This error seems to indicate the Cython code is not compiled, which is only the case if you downloaded this library and are working in that directory directly.

If you, for some reason, cannot use pip, running the following command will compile the Cython code inplace, and should fix the issue you have:

python setup.py build_ext --inplace

I don't recommend this, and it will require a C compiler, which is a bit of a pain to setup on windows. More information about this process can be found at https://sklearn-pmml-model.readthedocs.io/en/latest/install.html#from-source.

If I use logistic regression to do the tripartite model can't it predict

yyyooohao commented 3 years ago

which is only the case if you downloaded this library and are working in that directory directly.

I can use PIP, how can I simply use random forest, I don't want to install c compiler.

yyyooohao commented 3 years ago

Please make sure you installed the library using pip install sklearn-pmml-model. This error seems to indicate the Cython code is not compiled, which is only the case if you downloaded this library and are working in that directory directly.

If you, for some reason, cannot use pip, running the following command will compile the Cython code inplace, and should fix the issue you have:

python setup.py build_ext --inplace

I don't recommend this, and it will require a C compiler, which is a bit of a pain to setup on windows. More information about this process can be found at https://sklearn-pmml-model.readthedocs.io/en/latest/install.html#from-source. Why do I use logistic regression to do the binary classification of such errors, the first two days can also do three classifications will report errors ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1024)

iamDecode commented 3 years ago

If you use pip to install the library, no C compiler is necessary. More information on how to install using pip can be found in the documentation: https://sklearn-pmml-model.readthedocs.io/en/latest/install.html#pip.

pip is the standard package manager for Python, and is included with every Python install. The documentation includes a link to more general information about pip here: https://packaging.python.org/tutorials/installing-packages/#use-pip-for-installing.

yyyooohao commented 3 years ago

If you use pip to install the library, no C compiler is necessary. More information on how to install using pip can be found in the documentation: https://sklearn-pmml-model.readthedocs.io/en/latest/install.html#pip.

pip is the standard package manager for Python, and is included with every Python install. The documentation includes a link to more general information about pip here: https://packaging.python.org/tutorials/installing-packages/#use-pip-for-installing.

I installed packages from Requiest with PIP. Why do I get errors with those models

iamDecode commented 3 years ago

Why do I get errors with those models

You have to let me know which errors you are seeing, otherwise I cannot help you.


I am expecting you still installed the packages with pip but are still within a clone of this package. If you are working in a copy of this repository, please remove it, start fresh, do a pip install, and try out the example I provided here: https://github.com/iamDecode/sklearn-pmml-model/issues/35#issuecomment-906271001. If this works, you can proceed to try different models and datasets.

yyyooohao commented 3 years ago

Why do I get errors with those models

You have to let me know which errors you are seeing, otherwise I cannot help you.

I am expecting you still installed the packages with pip but are still within a clone of this package. If you are working in a copy of this repository, please remove it, start fresh, do a pip install, and try out the example I provided here: #35 (comment). If this works, you can proceed to try different models and datasets.

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1024) image The error occurred when I used logistic regression or ridge regression, it is ok to carry out binary classification before logistic regression, can triple classification be used? I mainly use it to test binary classification and triple classification. If it is triple classification, do I need to make any modifications。

yyyooohao commented 3 years ago

Why do I get errors with those models

You have to let me know which errors you are seeing, otherwise I cannot help you.

I am expecting you still installed the packages with pip but are still within a clone of this package. If you are working in a copy of this repository, please remove it, start fresh, do a pip install, and try out the example I provided here: #35 (comment). If this works, you can proceed to try different models and datasets.

Well, use the package version, but don't use it directly in your project.

yyyooohao commented 3 years ago

Why do I get errors with those models

You have to let me know which errors you are seeing, otherwise I cannot help you.

I am expecting you still installed the packages with pip but are still within a clone of this package. If you are working in a copy of this repository, please remove it, start fresh, do a pip install, and try out the example I provided here: #35 (comment). If this works, you can proceed to try different models and datasets.

image I used logistic to classify them into three categories and found Exception: PMML model does not contain RegressionModel. Reinstalled the package, the dichotomies can be predicted, ridge regression is also such a problem.

yyyooohao commented 3 years ago

Why do I get errors with those models

You have to let me know which errors you are seeing, otherwise I cannot help you.

I am expecting you still installed the packages with pip but are still within a clone of this package. If you are working in a copy of this repository, please remove it, start fresh, do a pip install, and try out the example I provided here: #35 (comment). If this works, you can proceed to try different models and datasets.

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject, Do I need to do some configuration when I use GBDT classification.

iamDecode commented 3 years ago

I used logistic to classify them into three categories and found Exception: PMML model does not contain RegressionModel. Reinstalled the package, the dichotomies can be predicted, ridge regression is also such a problem.

Ok I think I understand now. You seem to be using the multi_class='ovr' parameter on your LogisticRegression class (from https://github.com/iamDecode/sklearn-pmml-model/issues/35#issuecomment-906867774). This means one-versus-rest regression. This type is not explicitly supported by the library yet, but I am working on adding it right now.

To get it working in the mean time, you can use the default parameter multi_class='auto' or specifically select multi_class='multinomial' instead. This type of regression should work!

yyyooohao commented 3 years ago

I used logistic to classify them into three categories and found Exception: PMML model does not contain RegressionModel. Reinstalled the package, the dichotomies can be predicted, ridge regression is also such a problem.

Ok I think I understand now. You seem to be using the multi_class='ovr' parameter on your LogisticRegression class (from #35 (comment)). This means one-versus-rest regression. This type is not explicitly supported by the library yet, but I am working on adding it right now.

To get it working in the mean time, you can use the default parameter multi_class='auto' or specifically select multi_class='multinomial' instead. This type of regression should work!

image Well, I had a logistic triage error,Exception: PMML model does not contain RegressionModel.

yyyooohao commented 3 years ago

I used logistic to classify them into three categories and found Exception: PMML model does not contain RegressionModel. Reinstalled the package, the dichotomies can be predicted, ridge regression is also such a problem.

Ok I think I understand now. You seem to be using the multi_class='ovr' parameter on your LogisticRegression class (from #35 (comment)). This means one-versus-rest regression. This type is not explicitly supported by the library yet, but I am working on adding it right now.

To get it working in the mean time, you can use the default parameter multi_class='auto' or specifically select multi_class='multinomial' instead. This type of regression should work!

image soga,Three categories running, ha ha

iamDecode commented 3 years ago

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject,

This error typically means you have to re-install numpy (pip install numpy --upgrade)

iamDecode commented 3 years ago

soga,Three categories running, ha ha

Glad you got it working! I have just released a new version that should also work with multi_class='ovr'. If your initial problem is resolved, can I close this issue?