nanoporetech / pychopper

A tool to identify, orient, trim and rescue full length cDNA reads
Other
80 stars 22 forks source link

Generating report failed? #45

Closed claumer closed 3 years ago

claumer commented 4 years ago

Hi there,

I was running pychopper on a cDNA library I sequenced on a R9.4.1 MinION flow cell, with custom dT primers and TSO sequences. The precise command I used was:

cdna_classifier.py -b primers.fa -c primer_config.txt -u DLY008_cDNA_R941_091020_unclassed.fastq -w DLY008_cDNA_R941_091020_rescued.fastq -l DLY008_cDNA_R941_091020_tooshort.fastq -S pychopped_stats.txt -K DLY008_cDNA_R941_091020_lowQ.fastq -m edlib -t 32 DLY008_cDNA_R941_091020.fastq DLY008_cDNA_R941_091020_pychopped.fastq

However, rather than finishing gracefully, pychopper seems to get through the entire dataset and classify the input reads with reasonable results, but at the very end it spits up and fails to generate the stats file and the pdf report. See the STDERR below

(base) [claumer@noah-login-02 pychopped]$ tail -n 40 pc.err Using kit: PCS109 Configurations to consider: "+:UDP002_i5_TSO,-UDP002_i7_dT|-:UDP002_i7_dT,-UDP002_i5_TSO" Counting fastq records in input file: DLY008_cDNA_R941_091020.fastq Total fastq records in input file: 13844247 Tuning the cutoff parameter (q) on 9020 sampled reads (0.1%) passing quality filters (Q >= 7.0). Optimizing over 30 cutoff values. 100%|██████████| 30/30 [01:44<00:00, 3.48s/it] Best cutoff (q) value is 0.5517 with 83% of the reads classified. Processing the whole dataset using a batch size of 432632: 91%|█████████ | 12558488/13844247 [2:31:17<15:29, 1383.53it/s]
Finished processing file: DLY008_cDNA_R941_091020.fastq Input reads failing mean quality filter (Q < 7.0): 1285760 (9.29%) Output fragments failing length filter (length < 50): 273051 Detected 1 potential artefactual primer configurations: Configuration NrReads PercentReads UDP002_i5_TSO,UDP002_i5_TSO,-UDP002_i7_dT 494895 3.94% Traceback (most recent call last): File "/nfs/research1/marioni/claumer/miniconda3/bin/cdna_classifier.py", line 427, in stdf.to_csv(args.S, sep="\t", index=False) File "/nfs/research1/marioni/claumer/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 3204, in to_csv return getattr(self, "_cacher", None) is not None File "/nfs/research1/marioni/claumer/miniconda3/lib/python3.7/site-packages/pandas/io/formats/csvs.py", line 190, in save compression=dict(self.compression_args, method=self.compression), TypeError: get_handle() got an unexpected keyword argument 'errors'

Can you please advise on what might be going wrong here? Would be very useful to see the report... Thank you for your input.

Regards, Chris L

philres commented 4 years ago

Hi Chris,

could you let us know what version of pychopper you are using and would it be possible to share a minimal datasets that produces this error so we can try to reproduce it?

Thanks, Philipp

claumer commented 4 years ago

Dear Philipp,

I can't see any text about license or version in the main cdna_classifier.py script, but conda list tells me I'm using pychopper 2.5.0.

Strangely - when making the data subset to share, I found that I could get the script to finish normally, printing a report and stats file. The only difference to the first time I tried this was that I did it in an interactive job on our cluster this time (whereas the last was done as an LSF job on an HPC cluster).

So I tried it again on the full dataset in an interactive shell and I'm pleased to say it worked this time, finishing properly and giving me the report I was looking for.

It remains a bit of a mystery why this wouldn't happen in the LSF job version though - any ideas? Do you still want a data subset, given this new information?

Thanks, Chris

philres commented 4 years ago

Thanks, Chris. Glad to hear it worked now.

Any chance that conda (automatically) updated any of the dependencies. For example matplotlib?

Thanks, but if it works now I don't think it has anything to do with the data itself.

Could you run a: pip freeze and send me the output. Just for the record.

Thanks, Philipp