immcantation / presto

pRESTO is part of the Immcantation analysis framework for Adaptive Immune Receptor Repertoire sequencing (AIRR-seq). pRESTO is a bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.
https://presto.readthedocs.io
GNU Affero General Public License v3.0
0 stars 0 forks source link

Add support for input of gzip files and/or stdin #25

Closed ssnn-airr closed 6 years ago

ssnn-airr commented 9 years ago

Original report by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


From @vrbacky :
"I'm preparing some new pipelines and I'd like to cut primers from our amplicon sequencing data. MaskPrimers seems to be good tool to do it but it'd be great if it can use fastq.gz files or STDIN/STDOUT. Do you plan to implement it?"

I think .gz should be pretty straightforward. I don't think biopython or scikit-bio have native support fastq.gz, but I'm guessing opening the file with the gzip library will work. Need to look into it.

ssnn-airr commented 6 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


Going to close this, as it would only work for the first step in the pipeline anyway. It'd be overly slow to uncompress/recompress for every step in a pipeline, so one uncompress at the beginning seems more straightforward.

ssnn-airr commented 7 years ago

Original comment by Jason Vander Heiden (Bitbucket: javh, GitHub: javh).


Got another request for this. Bio.SeqIO.index will not work with standard gzipped files though, so it would either need to be done through a temp file or only support BGZF compressed files.