uio-cels / NucDiff

In-depth characterization and annotation of differences between two sets of DNA sequences
Mozilla Public License 2.0
59 stars 10 forks source link

uncovered ref regions #5

Closed jaworskicoline closed 7 years ago

jaworskicoline commented 7 years ago

Hello, in the summary stats files appear those two lines: "Uncovered ref regions num" "Uncovered ref regions len" I was unable to find the exact definition in the manuscript nor in the github definition list. Would you mind explaining how they were calculated ? egf, how is a region defined (how many bases in a row) ?, Is the uncovered ref regions length the sum of the lengths of all uncovered regions ? etc. Thank you so much ! Coline

kseniakh commented 7 years ago

Hello,

The definitions are given in the wiki manual page : https://github.com/uio-cels/NucDiff/wiki/stat.out

Uncovered ref regions num - the number of reference regions that were not covered by any query sequence (or mapped block) Uncovered ref regions len - the number of bases inside these regions

jaworskicoline commented 7 years ago

Thank you !! But you had some working definition for defining an uncovered region ? eg a minimum length ? to be considered as something different than deletion in the template assembly ?

Or, just by identifying the names of sequences in the reference genome that were not called in the alignment ?

Thank you Coline

kseniakh commented 7 years ago

Uncovered regions consist of bases that were not met in the NUCmer output file and are not a part of any detected difference. Information about exact coordinates of uncovered regions is given in ref_additional.gff as "uncovered_region" entries.

jaworskicoline commented 7 years ago

OK, thank you very much for the clarification. Coline