shubhamchandak94 / HARC

Fastq compression
17 stars 8 forks source link

HARC or Spring #11

Closed sagnikbanerjee15 closed 4 years ago

sagnikbanerjee15 commented 4 years ago

I have fasta files (NOT Fastq) to compress. Between HARC and Spring which compressor should I be using? Is it at all possible to compress fasta files with these.

Thanks.

shubhamchandak94 commented 4 years ago

Hi, Spring and HARC both expect fastq files right now. Also they are meant as read compressors whereas fasta files contain genomes in a lot of cases. For compressing genomes, there are other better compressors available, e.g., see [1]. Are you working with short reads represented as fasta files? In that case one solution is to add fake quality values with the appropriate lengths and use Spring. Let me know if this is the situation and I can also add a flag to Spring to switch between the input file format. Between HARC and Spring, please always use Spring since that's better maintained and supports a superset of features supported by HARC. Regards, Shubham

  1. https://www.biorxiv.org/content/10.1101/642553v2.full.pdf
sagnikbanerjee15 commented 4 years ago

Hi,

The fasta files are from short reads. I have added quality scores and it works fine. Actually, I have an alignment file in the bam format. I do not need all the information that is present in the bam file. So I extracted whatever information I need and converted it into a fastq file and compressed it.

Thanks.

shubhamchandak94 commented 4 years ago

Hi Sagnik, Please see latest commit on Spring which adds a --fasta-input flag which can be specified when the input is fasta instead of fastq. The --no-qualities option need not be specified, it will automatically be inferred from the fasta input option. Regards, Shubham

sagnikbanerjee15 commented 4 years ago

Thanks a lot, man!