Open sadsquirrel369 opened 2 days ago
I've tested the pmml evaluator in Python which was more than 100x faster than the one in R
This is a known issue to me.
The project is currently in pre-release status. Just wanted to do a proof-of-concept, to see if all the bits and pieces are available to make the R wrapper possible.
The prototype is successful in a sense that predictions can be made. The next step is to make it performant.
I believe that the majority of time is spent in formatArguments
and parseResults
methods, which deal with transforming an R data container (a named list
) to a Java data container (a java.util.HashMap
), and back.
I can think of two workarounds:
list
object in RDS data format, and then read the results back as another list
object. The trouble is that currently the Java side can read/parse RDS, but it cannot format/write RDS.list
data container altogether, and do data exchange using plain CSV data format.I personally find the first workaround a bit more elegant-ish, but it needs some research and development around RDS data format. Then again, the RDS formatter must only support named lists (plus R scalar types), so hopefully it's not a lot of work.
@sadsquirrel369 If you have any other ideas how to implement the data exchange between R and Java more efficiently, please share.
I've tested the pmml evaluator in Python which was more than 100x faster than the one in R
Another thing is that the Python wrapper supports batch prediction mode (pass 1'000 data records back and forth at once), whereas the R wrapper doesn't - it emulates over batch rows on the R side, and performs single/elementary prediction operation on each of them.
So, the R wrapper should be able to exchange "a list of named lists" with the Java side atomically.
Hi there,
I've tested the pmml evaluator in Python which was more than 100x faster than the one in R. It looks like it is only using a single core on my machine. Is there anything I can do to speed it up?