Open svazzole opened 6 years ago
if I have a matrix of features in which the features names contains some particular characters (such as &) the package throws an exception connected to RExpParser.
Can you paste the full stack trace of this exception here?
Better yet, can you provide a reproducible example (a toy dataset and an R script) that I could play with?
Here you have the output of the command. As soon as possible I will give you the precise example.
D:\jpmml-r-master>java -Xms4G -Xmx16G -jar target/converter-executable-1.2-SNAPSHOT.jar --rds-input LibSVMAnomalyFormulaReq.rds --pmml-output model.pmml
set 19, 2017 4:59:39 PM org.jpmml.rexp.Main run
INFORMAZIONI: Parsing RDS..
Exception in thread "main" java.lang.StackOverflowError
at java.io.DataInputStream.readInt(Unknown Source)
at org.jpmml.rexp.XDRInput.readInt(XDRInput.java:62)
at org.jpmml.rexp.RExpParser.readInt(RExpParser.java:481)
at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:67)
at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:155)
at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
at org.jpmml.rexp.RExpParser.readFunctionCall(RExpParser.java:218)
at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:82)
at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:155)
at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
at org.jpmml.rexp.RExpParser.readFunctionCall(RExpParser.java:218)
at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:82)
at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:155)
at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
at org.jpmml.rexp.RExpParser.readFunctionCall(RExpParser.java:218)
at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:82)
at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:155)
at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
at org.jpmml.rexp.RExpParser.readFunctionCall(RExpParser.java:218)
at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:82)
at org.jpmml.rexp.RExpParser.readPairList(RExpParser.java:155)
at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:74)
at org.jpmml.rexp.RExpParser.readFunctionCall(RExpParser.java:218)
at org.jpmml.rexp.RExpParser.readRExp(RExpParser.java:82)
Very interesting - the RDS parser component appears to go into infinite loop.
A reproducible example would be much appreciated. Can you share your LibSVMAnomalyFormulaReq.rds
RDS file, which is very nicely broken?
In your R script, can you temporarily work around this issue by escaping variable names? For example, try surrounding them with backticks as suggested here: https://stackoverflow.com/questions/3574385/can-i-escape-characters-in-variable-names
Ok, I will try to explain myself better. Unfortunately I cannot send you the data (for privacy reasons). I will try to build a toy model with the same errors. What I can tell you is that the names of the features contains 4-grams of apache logs (so something like "GET ", "ET /", "T /g" and so on...). I'm trying to do anomaly detection on the requests so I'm building a One-Class SVM (both in R and Python). When I use Python there are no problems with the variable names while in R I had to use the following trick: I changed all the variables names to "X1X", "X2X", "X3X" and so on. This fixed the problem and the jpmml-r package performed correctly the conversion rds --> pmml. Then I changed again the variable names in the pmml file taking into account that "&" --> "\&". This created the correct model and the results agreed with the Python one. Here I have another question: I'm trying to use the pmmls created inside a scala program. While the results from R and Python agrees (as I said before), the results from the scala One-Class SVM model are quite different? Have you any ideas about this? Could this be an issue with scala (i'm thinking about machine precision) or something with the One-Class SVM (and libsvm)? Thanks for your time. Best, Simon
The PMML standard (and the JPMML implementation of it) does not have a concept of reserved symbols/keywords. For example, the string &
would be a perfectly acceptable field name. There is no need of escaping it as \&
or &
- honey badger don't care.
The problem is specific to the R platform, because R has the concept of reserved symbols/keywords. The problem would probably be resolved by escaping variable names properly - did you try using backticks as suggested above? It is no wonder that the RDS parser gets confused when the RDS model file contains incorrect RDS strings. Sure, it would be nice if the RDS parser would be able to detect and recover in such a situation, but you as an R end user can prevent this situation from happening in the first place.
Hi, thanks a lot for your work! I noticed a problem when working with JPMML-R: if I have a matrix of features in which the features names contains some particular characters (such as &) the package throws an exception connected to RExpParser. On the opposite the JPMML-Sklearn package is not affected by this behaviour: it creates an xml file containing the names in which the character "&" is correctly substituted by "\&". Do you think this is a problem? If so, can you fix it? Best, Simon