mehrdadbakhtiari / adVNTR

A tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data
http://advntr.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
41 stars 15 forks source link

OSError: [Errno 24] Too many open files #34

Closed nbargues closed 6 months ago

nbargues commented 4 years ago

Hi, apparently I have a multiprocess error when running adVNTR ; the command :

advntr genotype --alignment_file $bam --working_directory $advntrDir/$bam_name --vntr_id 25561 --pacbio --frameshift -m ../hg19_selected_VNTRs_Pacbio.db -t $Ncpu

the error :

[M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 14991 reads Process Process-1009: Traceback (most recent call last): File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 749, in _callmethod conn = self._tls.connection AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 264, in check_if_pacbio_read_spans_vntr self.check_if_flanking_regions_align_to_str(str(read.seq).upper(), length_distribution, spanning_reads) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 260, in check_if_flanking_regions_align_to_str spanning_reads.append(read_str[left_align[3]:right_align[3]+flanking_region_size]) File "", line 2, in append File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 753, in _callmethod self._connect() File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 740, in _connect conn = self._Client(self._token.address, authkey=self._authkey) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/connection.py", line 487, in Client c = SocketClient(address) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient s.connect(address) FileNotFoundError: [Errno 2] No such file or directory Process Process-1017: Traceback (most recent call last): File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 749, in _callmethod conn = self._tls.connection AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 264, in check_if_pacbio_read_spans_vntr self.check_if_flanking_regions_align_to_str(str(read.seq).upper(), length_distribution, spanning_reads) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 260, in check_if_flanking_regions_align_to_str spanning_reads.append(read_str[left_align[3]:right_align[3]+flanking_region_size]) File "", line 2, in append File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 753, in _callmethod self._connect() File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 740, in _connect conn = self._Client(self._token.address, authkey=self._authkey) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/connection.py", line 487, in Client c = SocketClient(address) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient s.connect(address) FileNotFoundError: [Errno 2] No such file or directory Process Process-1010: Traceback (most recent call last): File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 749, in _callmethod conn = self._tls.connection AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 264, in check_if_pacbio_read_spans_vntr self.check_if_flanking_regions_align_to_str(str(read.seq).upper(), length_distribution, spanning_reads) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 260, in check_if_flanking_regions_align_to_str spanning_reads.append(read_str[left_align[3]:right_align[3]+flanking_region_size]) File "", line 2, in append File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 753, in _callmethod self._connect() File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 740, in _connect conn = self._Client(self._token.address, authkey=self._authkey) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/connection.py", line 487, in Client c = SocketClient(address) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient s.connect(address) FileNotFoundError: [Errno 2] No such file or directory Process Process-994: Traceback (most recent call last): File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 749, in _callmethod conn = self._tls.connection AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 264, in check_if_pacbio_read_spans_vntr self.check_if_flanking_regions_align_to_str(str(read.seq).upper(), length_distribution, spanning_reads) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 260, in check_if_flanking_regions_align_to_str spanning_reads.append(read_str[left_align[3]:right_align[3]+flanking_region_size]) File "", line 2, in append File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 753, in _callmethod self._connect() File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 740, in _connect conn = self._Client(self._token.address, authkey=self._authkey) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/connection.py", line 487, in Client c = SocketClient(address) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient s.connect(address) FileNotFoundError: [Errno 2] No such file or directory Process Process-1003: Traceback (most recent call last): File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 749, in _callmethod conn = self._tls.connection AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, self._kwargs) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 264, in check_if_pacbio_read_spans_vntr self.check_if_flanking_regions_align_to_str(str(read.seq).upper(), length_distribution, spanning_reads) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 260, in check_if_flanking_regions_align_to_str spanning_reads.append(read_str[left_align[3]:right_align[3]+flanking_region_size]) File "", line 2, in append File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 753, in _callmethod self._connect() File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/managers.py", line 740, in _connect conn = self._Client(self._token.address, authkey=self._authkey) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/connection.py", line 487, in Client c = SocketClient(address) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient s.connect(address) FileNotFoundError: [Errno 2] No such file or directory Traceback (most recent call last): File "/opt/miniconda3/envs/gene36/bin/advntr", line 11, in sys.exit(main()) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/main.py", line 121, in main genotype(args, genotype_parser) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/advntr_commands.py", line 101, in genotype genome_analyzier.find_repeat_counts_from_pacbio_alignment_file(input_file) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/genome_analyzer.py", line 103, in find_repeat_counts_from_pacbio_alignment_file copy_numbers = self.vntr_finder[vid].find_repeat_count_from_pacbio_alignment_file(alignment_file, reads) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/profiler.py", line 8, in wrapper retval = func(*args, *kwargs) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 477, in find_repeat_count_from_pacbio_alignment_file mapped_spanning_reads = self.get_spanning_reads_of_aligned_pacbio_reads(alignment_file) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/profiler.py", line 8, in wrapper retval = func(args, kwargs) File "/opt/miniconda3/envs/gene36/lib/python3.6/site-packages/advntr/vntr_finder.py", line 329, in get_spanning_reads_of_aligned_pacbio_reads p.start() File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/process.py", line 105, in start self._popen = self._Popen(self) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/opt/miniconda3/envs/gene36/lib/python3.6/multiprocessing/popen_fork.py", line 65, in _launch parent_r, child_w = os.pipe() OSError: [Errno 24] Too many open files

mehrdadbakhtiari commented 4 years ago

Hi Nicolas,

Could you please provide $Ncpu as well? I can probably reproduce the error easier with it and it helps me to debug.

Also, now I noticed that you plan to identify a frameshift instead of copy number change. I was under the impression that you want to identify number of repeats (as most people do this) and suggested --pacbio would be useful. Our frameshift identification won't perform very well with error prone reads and we use it only with short reads. I would suggest to use a tool specifically designed for this task with nanopore reads and extensively tested for it. There are multiple such tools and I am aware of longshot which I recommend you to use instead of our tool.

Thank you for reporting the issue with multiprocessing.

nbargues commented 4 years ago

Hi, I sequence long read of the gene MUC1 of a patient that I know is positif to the MUC1 mutation causal of a disease ( that is in your database ) and I want , via your software, confirm that this mutation is present. Do you think that your software is capable of that ? and what are the best parameters for that ?

In the previous example, $Ncpu = 18.

Edit : I re-try the command but without --frameshift and -t argument and the same error occur

mehrdadbakhtiari commented 4 years ago

Yes. We can do it for MUC1 with accurate short reads but for long error prone reads our approach is not the best (shorter reads are not disadvantage for us like GATK, etc, but high error rate is a problem for frameshift identification). If I remember correctly, this VNTR is ~1000bp so using nanopore reads this case shouldn't be different with any other SNV for long read variant callers like longshot.

Thank you for providing the information about the bug. We work to resolve it independently.