vertica / DistributedR

GNU General Public License v2.0
162 stars 54 forks source link

Integration with accumulo #32

Open madhvi-gupta opened 9 years ago

madhvi-gupta commented 9 years ago

How accumulo can be made a data source for distributedR so that analytics can be done over that data parallely?

fun-indra commented 9 years ago

Hi Madhvi, The issue with running distributedR with accumulo is that you need a connector to read data from accumulo to R. We have neither created nor tested any data loaders for accumulo. You are welcome to search for other open source R-accumulo connectors. A quick search shows the the following https://github.com/DataTacticsCorp/raccumulo (though I have no idea whether it works or not).

We will soon release a HDFS connector. It will help you load data directly from HDFS and run distributedR applications.

madhvi-gupta commented 9 years ago

On Thursday 06 August 2015 02:02 AM, IndraR wrote:

Hi Madhvi, The issue with running distributedR with accumulo is that you need a connector to read data from accumulo to R. We have neither created nor tested any data loaders for accumulo. You are welcome to search for other open source R-accumulo connectors. A quick search shows the the following https://github.com/DataTacticsCorp/raccumulo (though I have no idea whether it works or not).

We will soon release a HDFS connector. It will help you load data directly from HDFS and run distributedR applications.

— Reply to this email directly or view it on GitHub https://github.com/vertica/DistributedR/issues/32#issuecomment-128139367.

Hi Indra,

I am currently trying to use raccumulo(github link you shared) for loading data in distributedR but it's not working as required.It is not providing the whole data to be loaded in R.

Thanks and Regards Madhvi Gupta

fun-indra commented 9 years ago

As I mentioned, we have not tried or tested any accumulo connectors. Still, are you able to load data in a single R session (not distributedR) using that connector? What is the code that you used with distributedR? What is the error? How much data is getting loaded?