mountainMath / cancensus

R wrapper for calling CensusMapper APIs
https://mountainmath.github.io/cancensus/index.html
Other
82 stars 15 forks source link

Consider replacing jsonlite with rcppsimdjson #153

Open dshkol opened 4 years ago

dshkol commented 4 years ago

The new c++ based json parser simdjson is making some waves with very impressive benchmarks compared to established parsers like jsonlite. An R implementation just hit CRAN: https://github.com/eddelbuettel/rcppsimdjson.

Perhaps risky to adopt just yet but we should monitor. Possible downside would be adding an Rcpp dependency. But the upside could be significant improvement in parsing speed of long datasets for cancensus and cansim.

knapply commented 4 years ago

Please reach out if you'd like assistance with this.

That could be as simple as asking for another set of eyes to optimize any inclusion (it's not just fast, it's extremely flexible) or as complex as assistance in mapping any JSON to the desired R objects at the C++ level.

lemire commented 4 years ago

Possible downside would be adding an Rcpp dependency.

Is that a significant downside?

dshkol commented 4 years ago

Possible downside would be adding an Rcpp dependency.

Is that a significant downside?

No, not at all.

It's just that between @mountainMath and myself we've tried to limit the number of hard dependencies. AFAIK, rcpp is already pretty widely installed among the R user base so it wouldn't be an issue for most.

We've found that this package tends to attract some users who are completely new to R altogether so the philosophy was to try to avoid adding dependencies where possible just in case.

I'll make a test branch and check improvements in any case.

dshkol commented 4 years ago

Please reach out if you'd like assistance with this.

That could be as simple as asking for another set of eyes to optimize any inclusion (it's not just fast, it's extremely flexible) or as complex as assistance in mapping any JSON to the desired R objects at the C++ level.

Thanks for the generous offer. I think we will test it out in a branch and go from there. From the profiling we've done the current json-parsing stage isn't as much of an issue as some other parts of the code. Will def reach out if we implement it.

Btw, thanks for your work on these libraries @knapply and @lemire. I've seen the improvement first hand in some other packages and it's made a big difference where json parsing is a significant overhead.