xros / jsonpyes

The tool which imports raw JSON to ElasticSearch in one line of commands
Other
67 stars 21 forks source link

how to call thread.join() #33

Open kf89 opened 6 years ago

kf89 commented 6 years ago

From your code I have seen that you call thread.join after calling thread.start() and inside the same loop.But from here you can see that the idiomatic way is to call thread.join in another loop. for t in ts: t.join() is generally the idiomatic way to start a small number of threads. Doing .join means that your main thread waits until the given thread finishes before proceeding in execution. You generally do this after you've started all of the threads.

I have used both ways, and it's strange that the program with idiomatic way of using thread.join() takes more execution time than yours.But maybe it is better in memory consumption which i haven't tested yet.

xros commented 6 years ago

Do you know why my codes run faster? Let me explain. Here's my codes

                threads = []
                for i in start_stop_line_list:
                    #t = StoppableThread(target=worker_import_to_es_for_threading, args=(data, i['start'], i['stop']))
                    t = threading.Thread(target=worker_import_to_es_for_threading, 
                                         args=(data, i['start'], i['stop'], Elasticsearch([bulk], verify_certs=True), index, doc_type, )
                    )
                    threads.append(t)
                    t.start()
                    t.join()

As you can see I created a new threading.Thread object in every iteration of the loop. In fact these t objects are not combined. so t.start() triggers the single thread to claim it is ready to process in each loop while the program goes on to t.join(). This will make the single t thread start no matter other threads are created or started. Now the t thread runs in RAM and the naming space of t (a new thread) will spawn without being effected by the previous t thread.

I had thought of your assumption. So the code was like this. By the way, I used import threading not import thread.

Hope this makes you understand it better.