lucidworks / spark-solr

Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
Apache License 2.0
446 stars 250 forks source link

Increase speed of reading Solr data by spark #351

Open bplaye opened 2 years ago

bplaye commented 2 years ago

Hello everyone,

I am using spark-solr to fetch 2 or 3 attributes (id and date attributes) from solr but it takes tens of seconds to fetch hundred thousands documents.

My solr collections have around 10 shards, and each of them have 4 replicas. My collections contains from ten millions documents to hundred millions of documents. Regarding the lucidworks spark-solr connector, I set rows to 10000 and splits to true.

Is it the expected behavior ? (I mean, is Solr slow when fetching data by essence ?) Or could you help me understand how to configure solr and this lucidworks connector to increase the fetch speed please ? I hardly found answers on the internet.

Thank you for your help :)