shahab-sarmashghi / RESPECT

Estimating repeat spectra and genome length from low-coverage genome skims
Other
11 stars 1 forks source link

Different coverage input data get different genome size #17

Closed szf1993325 closed 1 year ago

szf1993325 commented 1 year ago

I have a individual data (this species 2n=30 genome size nearly 3G followed the published paper report), the genome_length was 6257984238 when the coverage was 6.03 of my input data, but 3744743210 when my input data coverage was 0.41. All input data used the bbmerge.sh meraged. So, i want to know which is more reliable? And if i got 6G genome_length, that can reprecent this individual was a polyploidy ?

This is my output data. sample input_type sequence_type coverage genome_length uniqueness_ratio HCRM sequencing_error_rate average_read_length re422 sequence genome-skim 6.03 6257984238 0.17 35.74 0.0031 227.7566 re422 sequence genome-skim 0.41 3744743210 0.4 42.37 0.0085 227.7597

shahab-sarmashghi commented 1 year ago

I would say the lower coverage result is probably more accurate. The best is to downsample your 6X coverage sample to 0.5-4X range as explained here.