pereiramemo / AGS-and-ACN-tools

Fast and accurate Average Genome Size and 16S rRNA gene Average Copy Number computation in metagenomic data
GNU General Public License v3.0
13 stars 4 forks source link

Question: Input file formats? #1

Closed sturne29 closed 4 years ago

sturne29 commented 5 years ago

Is it possible for this tool to accept FASTQ and/or gzipped files (and if not, would that be possible for a future release)? I am very interested in trying this tool, but the time it would take to unzip files and convert them to FASTA files might erode any time I'd gain in runtime over MicrobeCensus. (All of my metagenomes are gzipped FASTQ files, and I tend to leave them that way to save space.)

EDIT: Also - is there a way to handle paired-end data without merging?

pereiramemo commented 4 years ago

Dear Sturne29,

I am glad you find these tools useful, and I am sorry for the late response (I was on holiday).

Let me answer one by one:

1) Is it possible for this tool to accept FASTQ and/or gzipped files? Yes. Although the current version was not explicitly designed for this, it can process fastq.gz files given that it uses bbduk.sh to filter/trim the input sequences (bbduk.sh can handle all these different input files). To input a fastq.gz file you will have to specify a minimum or maximum read length (--min_length or --max_lenght) so that your fastq.gz file will be processed by bbduk.sh before computing the AGS. Note: you could use a minimum (or maximum) length that does not change your input file (e.g., --min_length 10) Note2: Eventually, for all tools, the gzip files will be uncompressed within the tool.

2) Would that (accept FASTQ and/or gzipped files) be possible for a future release? Yes, for the next release, we could improve the tool to more easily handle fastq and gzipped files.

3) Is there a way to handle paired-end data without merging? Yes, you can input an interleaved fastq.gz file. As long as the reads have an appropriate length (i.e., between 120 and 200 bp), the accuracy of the tool will not be compromised.

I will be happy to help if there are further questions.

Best,

Emiliano