yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
120 stars 40 forks source link

automatically disable optimization if ref contain ambiguous nucleotide #334

Closed yceh closed 1 year ago

yceh commented 1 year ago

Placement can barely run, but it will produce strange "mutations" like N->A, so I am wondering whether this should be treated as error or warning.

AngieHinrichs commented 1 year ago

One way to sidestep a problem like that would be to mask/ignore those positions in all sequences, analogous to how we remove mutations at Problematic Sites from all inputs for SARS-CoV-2. That could keep it at a warning-level -- you are losing some sites, but only the ones where even the reference is not sure what the real value is.

yatisht commented 1 year ago

I agree, might be better to disable optimization at specific sites rather than entirely.

yceh commented 1 year ago

May be better to be done with matUtils mask? I am afraid auto-masking a position in usher or matOptimize may confuse users.