First, thank you again for developing pyensembl :)
In my application, I have a class that uses the pyensembl extensively. I initialize this class as follows:
class AnnotateVariants:
def __init__(self,..):
self.ensembl_data = EnsemblRelease(75)
...
I would need to allow multiple threads to use the pyenseml object, in order to use the functions to annotate variants in parallel.
For some internal reasons, I use the multiprocessing.dummy library, thus I use threads and not processes.
In my current implementation I assign to each distinct thread a new instance of the AnnotateVariants class.
However, looking at the log file I can see that the threads do not run in parallel. That is, say I start with a pool of 16 threads, 5 of them run in parallel and the others wait. Then the next subgroup of threads run and so on.
Is this related to the constructor of pyensembl (EnsemblRelease) as I see that the constructor gives the same ensembl release instance if it's already cached (docs)?
If this is true then the same connection to the sqlite 3 db instance is given to all the threads in pool, so the threads are in race-condition.
That's my interpretation of the observation. Please let me know your ideas.
Second, please advice me on your way to run pyensembl in a fully parallel way.
Dear pyensembl team,
First, thank you again for developing pyensembl :)
In my application, I have a class that uses the pyensembl extensively. I initialize this class as follows:
I would need to allow multiple threads to use the pyenseml object, in order to use the functions to annotate variants in parallel. For some internal reasons, I use the multiprocessing.dummy library, thus I use threads and not processes.
In my current implementation I assign to each distinct thread a new instance of the AnnotateVariants class. However, looking at the log file I can see that the threads do not run in parallel. That is, say I start with a pool of 16 threads, 5 of them run in parallel and the others wait. Then the next subgroup of threads run and so on.
Is this related to the constructor of pyensembl (EnsemblRelease) as I see that the constructor gives the same ensembl release instance if it's already cached (docs)?
If this is true then the same connection to the sqlite 3 db instance is given to all the threads in pool, so the threads are in race-condition. That's my interpretation of the observation. Please let me know your ideas.
Second, please advice me on your way to run pyensembl in a fully parallel way.
My pyensembl version is 1.9.0.
Thanks a lot!