Open rdauria opened 2 years ago
I should add that I installed MIDAS using python version 3.9.6, I have just noticed that when running the second part of the tutorial (snps) there are issues with the multiprocessing package that will give this error:
TypeError: cannot pickle '_io.TextIOWrapper' object
Any idea? What version of python do you support?
FYI, the error in context is:
/usr/bin/time -p -v run_midas.py snps midas_output/SAMPLE_1 -1 /u/local/apps/midas/EXAMPLE/ex
ample/sample_1.fq.gz -t 8
MIDAS: Metagenomic Intra-species Diversity Analysis System
version 1.3.0; github.com/snayfach/MIDAS
Copyright (C) 2015-2016 Stephen Nayfach
Freely distributed under the GNU General Public License (GPLv3)
===========Parameters===========
Command: /u/local/apps/midas/1.3.2/MIDAS/scripts/run_midas.py snps midas_output/SAMPLE_1 -1 /u/local/apps/midas/EXAMPLE/example/sample_1.fq.gz -t
8
Script: run_midas.py snps
Database: /u/local/apps/midas/DB/midas_db_v1.2
Output directory: midas_output/SAMPLE_1
Remove temporary files: False
Pipeline options:
build bowtie2 database of genomes
align reads to bowtie2 genome database
use samtools to generate pileups and count variants
Database options:
include all species with >=3.0X genome coverage
Read alignment options:
input reads (unpaired): /u/local/apps/midas/EXAMPLE/example/sample_1.fq.gz
alignment speed/sensitivity: very-sensitive
alignment mode: global
number of reads to use from input: use all
number of threads for database search: 8
SNP calling options:
minimum alignment percent identity: 94.0
minimum mapping quality score: 20
minimum base quality score: 30
minimum read quality score: 20
minimum alignment coverage of reads: 0.75
trim 0 base-pairs from 3'/right end of read
================================
Reading reference data
0.0 minutes
0.1 Gb maximum memory
Building database of representative genomes
total genomes: 1
total contigs: 1
total base-pairs: 5163189
0.04 minutes
0.26 Gb maximum memory
Mapping reads to representative genomes
finished aligning
checking bamfile integrity
0.09 minutes
0.44 Gb maximum memory
Indexing bamfile
0.0 minutes
0.44 Gb maximum memory
Counting alleles
Traceback (most recent call last):
File "/u/local/apps/midas/1.3.2/MIDAS/scripts/run_midas.py", line 757, in <module>
run_program(program, args)
File "/u/local/apps/midas/1.3.2/MIDAS/scripts/run_midas.py", line 82, in run_program
snps.run_pipeline(args)
File "/u/local/apps/midas/1.3.2/MIDAS/midas/run/snps.py", line 301, in run_pipeline
pysam_pileup(args, species, contigs)
File "/u/local/apps/midas/1.3.2/MIDAS/midas/run/snps.py", line 228, in pysam_pileup
aln_stats = utility.parallel(species_pileup, argument_list, args['threads'])
File "/u/local/apps/midas/1.3.2/MIDAS/midas/utility.py", line 101, in parallel
return [r.get() for r in results]
File "/u/local/apps/midas/1.3.2/MIDAS/midas/utility.py", line 101, in <listcomp>
return [r.get() for r in results]
File "/u/local/apps/python/3.9.6/gcc-4.8.5/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
File "/u/local/apps/python/3.9.6/gcc-4.8.5/lib/python3.9/multiprocessing/pool.py", line 537, in _handle_tasks
put(task)
File "/u/local/apps/python/3.9.6/gcc-4.8.5/lib/python3.9/multiprocessing/connection.py", line 211, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/u/local/apps/python/3.9.6/gcc-4.8.5/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object
Command exited with non-zero status 1
Command being timed: "run_midas.py snps midas_output/SAMPLE_1 -1 /u/local/apps/midas/EXAMPLE/example/sample_1.fq.gz -t 8"
User time (seconds): 55.55
System time (seconds): 5.84
Percent of CPU this job got: 669%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.16
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 358876
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 218678
Voluntary context switches: 61387
Involuntary context switches: 631
Swaps: 0
File system inputs: 79176
File system outputs: 169336
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1
I mange applications on a research cluster and our researchers have been reporting issues with execution speed of your software on out cluster.
I have just run through the first step of the tutorial (https://github.com/snayfach/MIDAS/blob/master/docs/tutorial.md) and I wonder whether you could let me know whether the timing (see below) we are getting are exceedingly long.
I am running the code a HPC network storage and on one core of a Intel(R) Xeon(R) Gold 6240 node (I can also provide timings on the following steps in the tutorial if that would help). Moving the DB, the sample file and the output directory to local storage did not seem to affect the speed significantly.
Thanks and please see the timings for the first step of the tutorial below: