mozack / abra

Assembly Based ReAligner
MIT License
70 stars 12 forks source link

Is YA tag deterministic? #38

Closed ionox0 closed 6 years ago

ionox0 commented 6 years ago

Thank you for you work developing this tool,

I'm trying to validate two pipelines that are being used at our institution. I was wondering whether the YA tag should always result in the same value between runs on the same input bam, or whether the assembly process could potentially result in different contigs between pipeline runs. Any information would be appreciated.

Thank you, Ian

mozack commented 6 years ago

We have not intentionally introduced any stochastic functionality to ABRA. There is some pseudo-random downsampling of reads in regions of high coverage, however the same seed is always used when randomly selecting reads, so this should be repeatable.

That being said, we have not explicitly tested this and cannot at this time guarantee that all contigs and/or realignments are deterministic.

ionox0 commented 6 years ago

Thank you for the information 👍

ionox0 commented 6 years ago

I looked into this, and it seems that there is a possibility of non-deterministic results in repetitive regions at the ends of reads. However out of 4000 realigned reads, this inconsistency was only present in 50 of them.

Two specific examples I found were a difference of 2 bp in an insertion:

screen shot 2018-04-10 at 6 43 35 pm

Or an insertion being repositioned to the other side of a repetitive region:

screen shot 2018-04-10 at 6 42 11 pm
ionox0 commented 6 years ago

Sorry, this may actually be due to another issue, I'm rerunning this comparison