Counting sequences in many fasta or fastq files

sr320 / course-fish546-2018

7 stars 2 forks source link

Counting sequences in many fasta or fastq files #35

Closed calderatta closed 5 years ago

calderatta commented 5 years ago

I'm trying to find read counts for each of the fastq files that I have. Right now, I'm using grep on the @ character but I'm not sure how to do this for multiple files.

sr320 commented 5 years ago

Here is some useful code: https://github.com/RobertsLab/code/blob/master/fasta.md

I think this code for fastq for fastq files: awk '{s++}END{print s/4}' file.fastq via https://www.biostars.org/p/139006/

I normally just use fastQC output to know number of reads.

using a for loop something like

%%bash
for f in /where/your/files/r/*fq
do
awk '{s++}END{print s/4}' $f
done

kubu4 commented 5 years ago

grep can process a list of files.

Here's an example for fastq files:

grep -c '@' *.fq

This will search for the @ symbol in any files ending with .fq and will print the filename and the number of results for each file.

kubu4 commented 5 years ago

"Pro" tip -

Most FastQ files comes compressed as gzip files.

Use zgrep -c '@' *.fq.gz to search through compressed FastQ files.