phyloacc / PhyloAcc

PhyloAcc a software to detect the changes of conservation of a genomic region
GNU General Public License v3.0
26 stars 12 forks source link

Encoding missing data #20

Closed darencard closed 4 years ago

darencard commented 4 years ago

I'm not quite to the stage of running PhyloAcc yet, but in looking at the documentation, I see no mention of missing data. For example, at a given locus, it is possible that one of the species does not have sequence information due to missing alignments, assembly artifacts, etc. Is there a way to encode this so a user could still run PhyloAcc on a concatenated alignment? Or is it necessary to perform separate PhyloAcc runs in instances where one or more species do not have orthologous sequence data? Any guidance is greatly appreciated!

All the best, Daren Card

xyz111131 commented 4 years ago

Hi Daren,

PhyloAcc can run with missing data. It treats characters other than acgtrykmsw in the input alignment file as missing data and assumes that the missing locus can be any character (i.e. acgt). If the missing sequence is too long for a species, it will increase the probability of acceleration in that species.

Best, zhirui

darencard commented 4 years ago

Thanks Zhirui, thats very helpful!