neufeld / pandaseq

PAired-eND Assembler for DNA sequences
GNU General Public License v3.0
129 stars 24 forks source link

Pandaseq doesn't recognize fastq files #79

Closed concscid closed 4 years ago

concscid commented 4 years ago

Hi,

I'm trying to run pandaseq on a set of 16S sequences in fastq files. I use the same script I always use, which usually works fine. However, these last few days I keep getting the following message:

ERR NOFILE My_samples/C8_P269_L001_R1_001.fastq Too confused to continue. Try -h for help.

The samples are in the folder and in .fastq though. Pandaseq doesn't seem to recognize them for some reason. Any idea of why this is happening?

Thank you!

apmasell commented 4 years ago

What is the full command line you are using? This error is from the operating system when the file cannot be opened or if it is compressed but corrupted.

concscid commented 4 years ago

!/usr/bin/env python

-- coding: utf-8 --

import subprocess from multiprocessing import Pool, current_process from os import listdir, mkdir, path from pprint import pprint, pformat import cmd from time import sleep

sample_dir="My_samples/" RDP_dir = sample_dir + 'RDP_classifier/' RDP_res_dir = RDP_dir + 'results/'

def main(): print "Génération de la liste des échantillons" list_samples = [] for f in listdir(sampledir): if '.fastq' in f: fname = f.split('.')[0] splitted = fname.split('') fname = splitted[0] + "_" + splitted[1] if fname not in list_samples: list_samples.append(fname)

print "Fusion des R1-R2"

if not path.exists(RDP_dir):
    mkdir(RDP_dir)
for f in list_samples:
    print "Traitement de " + f
    cmd = "/softs/manual/ampere/pandaseq/pandaseq " + \
         "-f " + sample_dir + f + "_L001_R1_001.fastq " +  \
         "-r " + sample_dir + f + "_L001_R2_001.fastq " + \
         "-A rdp_mle -a -B -l 410 -L 500 -o 20 -O 100 -T 4 " + \
         "-w " + RDP_dir + "res_" + f + "_pandaseq_bis.fasta " + \
         "-g " + RDP_dir + "log.txt"

    subprocess.call(cmd, shell=True)

Here's the command line. The problem might be in the files then?

apmasell commented 4 years ago
fname = f.split('.')[0]
splitted = fname.split('')
fname = splitted[0] + "" + splitted[1]

Do your files actually end in .fastq and not .fastq.gz?

concscid commented 4 years ago

Yes, I unzip them before starting the script.

apmasell commented 4 years ago

Try making sample_dir an absolute path.

concscid commented 4 years ago

I still get the same message :/

apmasell commented 4 years ago

Try running it manually on your files.

concscid commented 4 years ago

I just tried. Everything is fine until I get to the fusion part:

print ("Fusion des R1-R2") File "", line 1 print ("Fusion des R1-R2") ^ IndentationError: unexpected indent

... if not path.exists(RDP_dir): File "", line 2 if not path.exists(RDP_dir): ^ IndentationError: unexpected indent mkdir(RDP_dir) File "", line 1 mkdir(RDP_dir) ^ IndentationError: unexpected indent for f in list_samples: File "", line 1 for f in list_samples: ^ IndentationError: unexpected indent print ("Traitement de " + f) File "", line 1 print ("Traitement de " + f) ^ IndentationError: unexpected indent cmd = "/softs/manual/ampere/pandaseq/pandaseq " + \ File "", line 1 cmd = "/softs/manual/ampere/pandaseq/pandaseq " + \ ^ IndentationError: unexpected indent "-f " + sample_dir + f + "_L001_R1_001.fastq " + \ File "", line 1 "-f " + sample_dir + f + "_L001_R1_001.fastq " + \ ^ IndentationError: unexpected indent "-r " + sample_dir + f + "_L001_R2_001.fastq " + \ File "", line 1 "-r " + sample_dir + f + "_L001_R2_001.fastq " + \ ^ IndentationError: unexpected indent "-A rdp_mle -a -B -l 410 -L 500 -o 20 -O 100 -T 4 " + \ File "", line 1 "-A rdp_mle -a -B -l 410 -L 500 -o 20 -O 100 -T 4 " + \ ^ IndentationError: unexpected indent "-w " + RDPdir + "res" + f + "_pandaseq_bis.fasta " + \ File "", line 1 "-w " + RDPdir + "res" + f + "_pandaseq_bis.fasta " + \ ^ IndentationError: unexpected indent "-g " + RDP_dir + "log.txt" File "", line 1 "-g " + RDP_dir + "log.txt" ^ IndentationError: unexpected indent

    subprocess.call(cmd, shell=True)
apmasell commented 4 years ago

If something is wrong with your Python script, I can't help with that. If you can run PANDAseq on your input files from the command line, then I am going to close this ticket.

concscid commented 4 years ago

Yeah well I don't know where the problem is since I'm not a bioinformatician but the script has been working fine for everybody in the lab since it was written a couple years ago, and it seems to be problematic now when it gets to the PANDAseq part. Nevertheless, if you think it's not related to PANDAseq you can close the ticket, thanks.

apmasell commented 4 years ago

Sorry. Good luck.