some improvement about load the input data

nanoporetech / pychopper

A tool to identify, orient, trim and rescue full length cDNA reads

Other

80 stars 22 forks source link

some improvement about load the input data #28

Closed huangziyan11111 closed 4 years ago

huangziyan11111 commented 4 years ago

Since the data size of reads is big, I think maybe you could add some function to support the "*.gz" format of input reads and support standard output as input, so the following codes can be run:

cdna_classifier.py reads_input.fq.gz reads_full_length.fq -t 12 -w rescued.fq -u unclassified.fq -S stats.txt

zcat reads_input.fq.gz|...{some processes}...| cdna_classifier.py - reads_full_length.fq -t 12 -w rescued.fq -u unclassified.fq -S stats.txt

bsipos commented 4 years ago

The script needs to do a first pass on the file to count the number of records. Hence input from pipes cannot be supported.