snayfach / MIDAS

An integrated pipeline for estimating strain-level genomic variation from metagenomic data
http://dx.doi.org/10.1101/gr.201863.115
GNU General Public License v3.0
119 stars 52 forks source link

TypeError: cannot pickle '_io.TextIOWrapper' object #112

Open Ivan-vechetti opened 3 years ago

Ivan-vechetti commented 3 years ago

Hello, running

run_midas.py genes

Goes well but in the end, I get: E::idx_fin_and_load Could not retrieve index file for 'midas_output//genes/temp/pangenomes.bam'

And then when I run:

run_midas.py snps

Goes well but in the end, I get: TypeError: cannot pickle '_io.TextIOWrapper' object

Can someone help me with that?

Python 3.8.5

nick-youngblut commented 3 years ago

It seems to be caused by:

def iopen(inpath, mode='r'):
        """ Open input file for reading regardless of compression [gzip, bzip] or python version """
        ext = inpath.split('.')[-1]
        # Python2
        if sys.version_info[0] == 2:
                if ext == 'gz': return gzip.open(inpath, mode)
                elif ext == 'bz2': return bz2.BZ2File(inpath, mode)
                else: return open(inpath, mode)
        # Python3
        elif sys.version_info[0] == 3:
                if ext == 'gz': return io.TextIOWrapper(gzip.open(inpath, mode))
                elif ext == 'bz2': return bz2.BZ2File(inpath, mode)
                else: return open(inpath, mode)

which is called by species_pileup() in pysam_pileup(). I'm guessing that the file handler is not actually closed in the subprocess, which is causing the serialization error.

nick-youngblut commented 3 years ago

Actually, it seems to be due to passing the file hander in the args['log'] variable to species_pileup() via utility.parallel(). The file hander can't be serialized.

Changing:

def pysam_pileup(args, species, contigs):
        start = time()
        print("\nCounting alleles")
        args['log'].write("\nCounting alleles\n")

        # run pileups per species in parallel
        argument_list = []

to:

def pysam_pileup(args, species, contigs):
        start = time()
        print("\nCounting alleles")
        args['log'].write("\nCounting alleles\n")
        args['log'].close()      # new line

        # run pileups per species in parallel
        argument_list = []

Fixes the issue. It appears that the log file isn't actually written to by species_pileup anyway. I'll submit a PR

Ivan-vechetti commented 3 years ago

Thank you so much for your reply. Where should I change that line?

Thanks once again

Ivan

On Sat, Dec 19, 2020 at 7:27 AM Nick Youngblut notifications@github.com wrote:

Actually, it seems to be due to passing the file hander in the args['log'] variable to species_pileup() via utility.parallel(). The file hander can't be serialized.

Changing:

def pysam_pileup(args, species, contigs): start = time() print("\nCounting alleles") args['log'].write("\nCounting alleles\n")

    # run pileups per species in parallel
    argument_list = []

to:

def pysam_pileup(args, species, contigs): start = time() print("\nCounting alleles") args['log'].write("\nCounting alleles\n") args['log'].close() # new line

    # run pileups per species in parallel
    argument_list = []

Fixes the issue. It appears that the log file isn't actually written to by species_pileup anyway. I'll submit a PR

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/snayfach/MIDAS/issues/112#issuecomment-748474903, or unsubscribe https://github.com/notifications/unsubscribe-auth/APD5TZLAWRNY7DXMTCYWR6TSVSS5BANCNFSM4VAJBYCQ .

nick-youngblut commented 3 years ago

Check out the PR edits: https://github.com/snayfach/MIDAS/pull/113

Ivan-vechetti commented 3 years ago

Hi Nick,

thanks for the input, but adding the line as you suggested caused the run to an early finish with this message: IndentationError: unindent does not match any outer indentation level

Regarding the gene run, the message below is normal?

Computing coverage of pangenomes E::idx_fin_and_load Could not retrieve index file for 'midas_output//genes/temp/pangenomes.bam'

On Sat, Dec 19, 2020 at 10:18 AM Nick Youngblut notifications@github.com wrote:

Check out the PR edits: #113 https://github.com/snayfach/MIDAS/pull/113

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/snayfach/MIDAS/issues/112#issuecomment-748493905, or unsubscribe https://github.com/notifications/unsubscribe-auth/APD5TZOKQIMW76MDI66KVNDSVTG3VANCNFSM4VAJBYCQ .

nick-youngblut commented 3 years ago

My editor defaults to spaces, but MIDAS is written all with tabs. This caused the indentation error. I've fixed it. Also, I added a pop for the log variable, since it appears that closing the file handler didn't actually fix the serialization error. It should work now. At least, it works for me. There's no CI for the PRs, so it's untested for a broader set of envs (eg., different version of Ubuntu), but it should work.

Aiswarya-prasad commented 2 years ago

I tried this (although del instead of pop). This works for me.