morispi / CONSENT

Scalable long read self-correction and assembly polishing with multiple sequence alignment
https://doi.org/10.1038/s41598-020-80757-5
GNU Affero General Public License v3.0
55 stars 4 forks source link

paf file-size estimate #8

Open MichelMoser opened 5 years ago

MichelMoser commented 5 years ago

Hi,

I am excited to test CONSENT with a nanopore dataset of about 60x of a 600Mb genome. Its about 2.8 mio reads (41 Gb total length). Unfortunately, all-vs-all alignments expands very fast and i had to terminate after paf file reached 2.1 Terabyte. Is there a size estimate what is needed as temporary storage size for such a dataset?

In your bioarxiv publication, you ran CONSENT on 30x human data, what was the file-size of all-vs-all alignments there?

Cheers, Michel

morispi commented 5 years ago

Hi,

I'm afraid I can't precisely answer the question about the estimated size of the PAF file for your dataset.

The 30x human dataset was only composed of reads from chr1. The reads file was therefore 7 GB, and resulted in a 171 GB PAF file.

Cheers, Pierre

harish0201 commented 4 years ago

This might be a bit tardy solution, but @MichelMoser @morispi probably an easy way out might be to gzip the paf file and uncompress+stream it for the next step?