parallel get/set suggestion

lsj5031 commented 7 years ago

Hi,

I wrapped some function like

(defn foo 
  [keyname]
  (wcar*  (car/get keyname)))

I have a sequence of keys that I would like to get the values. But (pmap foo sequence-of-keys) would throw me an error saying cannot assign address. I googled a bit and it said it might because OS cannot release the port and allocate to new connection quick enough.

May I have some advice about what the best way to achieve parallel get/get functionality is?

ptaoussanis commented 7 years ago

Hi there,

This is potentially quite an involved question, and there aren't enough details here to properly advise.

I'd start by clarifying what you're actually trying to achieve. Why do you want to use pmap? What is the objective?

Some random observations:

pmap is (semi) lazy, which could be a cause of your connection problems
Connection problems could also be caused by your thread pool configuration
pmap isn't in general a good way to do parallel work
Trying to use multiple threads to speed up an IO job like this probably isn't what you want
Is there a reason you're not using something like the mget command, or Redis pipelining?

The most obvious alternative that'll be much simpler and much faster:

(defn foo
  ([keyname       ] (wcar* (car/get keyname)))
  ([keyname & more]
   (wcar*
     (doseq [keyname (cons keyname more)]
       (car/get keyname)))))

This uses pipelining and a single TCP roundtrip. It will be [much] faster than trying to execute multiple individual reads on individual threads.

Hope that helps, cheers :-)

lsj5031 commented 7 years ago

Thanks Peter, your explanation totally make sense, I really appreciate it.

My program is running on a 32 core machine, and get value by key and do something with each value is very thread-independent, therefore I thought using multiple threads will speed it up.

I think I would try mget instead then. Cheers. Thanks again for this great library.

ptaoussanis commented 7 years ago

No problem :-)

My program is running on a 32 core machine, and get value by key and do something with each value is very thread-independent, therefore I thought using multiple threads will speed it up.

Depending on how many values you're fetching, I might suggest that you fetch all the values with one Redis call - then partition the values into chunks, and handle the chunks on separate threads.

That way you use your cores for core-intensive work (the handling).

How big you want the chunks to be will depend on how expensive the work is per value.

Or you could do something like: using 8 threads, each thread fetches 5000 values in one TCP trip, then handles those values.

There's many tweaks you can make to how you approach this. The best way will depend on the details in your problem.

But as a general suggestion: try to maximize how much data each TCP roundtrip to Redis can fetch. Fetching 200kb in one roundtrip (pipeline) is much more efficient than fetching 1kb in 200 roundtrips.

taoensso / carmine

parallel get/set suggestion #185