shahab-sarmashghi / RESPECT

Estimating repeat spectra and genome length from low-coverage genome skims
Other
11 stars 1 forks source link

Paired-end, single-end #6

Closed mylena-s closed 3 years ago

mylena-s commented 3 years ago

Hi! I have a question regarding the input files. I have paired end reads in three files, for only one sample: PE1.fq, PE2.fq and SE.fq (unpaired). I tried to run RESPECT specifing only the directory where all the files are, and then each file was used to make independent estimations. Should I concatenate all filles or interleave the paired files?

Thanks in advanced

shahab-sarmashghi commented 3 years ago

Hi, Ideally I would recommend first merging the read pairs (using e.g. BBMerge), and then concatenating with single-end reads. However, you can also concatenate them all and see how the results look like; I do not expect that not merging overlapping reads have a large impact. Lastly, you do not need high coverage with RESPECT and in fact high coverage slows down the computation and we have optimized RESPECT algorithm for low coverage. In our benchmarking, 1X to 4X coverage of the genome should be enough.