Open madhvi-gupta opened 9 years ago
Hi Madhvi, The issue with running distributedR with accumulo is that you need a connector to read data from accumulo to R. We have neither created nor tested any data loaders for accumulo. You are welcome to search for other open source R-accumulo connectors. A quick search shows the the following https://github.com/DataTacticsCorp/raccumulo (though I have no idea whether it works or not).
We will soon release a HDFS connector. It will help you load data directly from HDFS and run distributedR applications.
On Thursday 06 August 2015 02:02 AM, IndraR wrote:
Hi Madhvi, The issue with running distributedR with accumulo is that you need a connector to read data from accumulo to R. We have neither created nor tested any data loaders for accumulo. You are welcome to search for other open source R-accumulo connectors. A quick search shows the the following https://github.com/DataTacticsCorp/raccumulo (though I have no idea whether it works or not).
We will soon release a HDFS connector. It will help you load data directly from HDFS and run distributedR applications.
— Reply to this email directly or view it on GitHub https://github.com/vertica/DistributedR/issues/32#issuecomment-128139367.
Hi Indra,
I am currently trying to use raccumulo(github link you shared) for loading data in distributedR but it's not working as required.It is not providing the whole data to be loaded in R.
Thanks and Regards Madhvi Gupta
As I mentioned, we have not tried or tested any accumulo connectors. Still, are you able to load data in a single R session (not distributedR) using that connector? What is the code that you used with distributedR? What is the error? How much data is getting loaded?
How accumulo can be made a data source for distributedR so that analytics can be done over that data parallely?