jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

a model missed some predict fields after converted #158

Closed laochen666 closed 3 years ago

laochen666 commented 3 years ago

my csv file is like: id,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40,f41,f42,f43,f44,f45,f46,f47,f48,f49,f50,f51,f52,f53,f54,f55,f56,f57,f58,f59,f60,f61,f62 2210,0,2,2,2,2,1,1,2,2,2,0,0,1,3,1,1,1,4,2,2,1,2,1,2,2,0,1,1,1,0,2,5,1,1,0,1,1,3,3,1,1,3,1,2,0,1,3,0,1,2,1,1,2,2,1,1,1,1,0,0,1

I have trained my model to predict field f2 with fields f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f30,f31,f32,f33,f34,f35,f36,f37,f38,f39,f40,f41,f42,f43,f44,f45,f46,f47,f48,f49,f50,f51,f52,f53,f54,f55,f56,f57,f58,f59,f60,f61,f62 by XGBoost.sklearn. I found that some fields were missed after I have finished the convert: x33,x36,x39,x42,x45,x48,x51,x52,x57

By the way, I have finished train by the same python code and converted without missed fields. The only different was that the fields is less. jpmml-converter.zip

Help me please. Thank you.

vruusmann commented 3 years ago

I found that some fields were missed after I have finished the convert: x33,x36,x39,x42,x45,x48,x51,x52,x57

Does the converted model make correct predictions or not? I bet it does.

All JPMML converter libraries post-process the PMML document by pruning all field declarations (DataField and DerivedField elements) that are not used by the model. This is a feature (reduced the file size), not a bug.

I've explained this behaviour multiple times in GitHub issues and in the JPMML mailing list. Use the search functionality.

vruusmann commented 3 years ago

I found that some fields were missed after I have finished the convert: x33,x36,x39,x42,x45,x48,x51,x52,x57

In other words - the JPMML-SkLearn library is telling you that these nine features are not relevant for the current modeling problem. You don't need to collect and process this data at all!

This is one of the unique functionalities of the (J)PMML approach. There are no other ML frameworks/libraries that can inform you about redundant features.

laochen666 commented 3 years ago

Thank you. I have made a mistaken that the missed fields had a same value.