How to run pyensembl using multiple threads?

Dear pyensembl team,

First, thank you again for developing pyensembl :)

In my application, I have a class that uses the pyensembl extensively. I initialize this class as follows:

class AnnotateVariants:
    def __init__(self,..):
       self.ensembl_data = EnsemblRelease(75) 
       ...

I would need to allow multiple threads to use the pyenseml object, in order to use the functions to annotate variants in parallel. For some internal reasons, I use the multiprocessing.dummy library, thus I use threads and not processes.

In my current implementation I assign to each distinct thread a new instance of the AnnotateVariants class. However, looking at the log file I can see that the threads do not run in parallel. That is, say I start with a pool of 16 threads, 5 of them run in parallel and the others wait. Then the next subgroup of threads run and so on.

Is this related to the constructor of pyensembl (EnsemblRelease) as I see that the constructor gives the same ensembl release instance if it's already cached (docs)?

If this is true then the same connection to the sqlite 3 db instance is given to all the threads in pool, so the threads are in race-condition. That's my interpretation of the observation. Please let me know your ideas.

Second, please advice me on your way to run pyensembl in a fully parallel way.

My pyensembl version is 1.9.0.

Thanks a lot!

openvax / pyensembl

How to run pyensembl using multiple threads? #251