xflouris / libpll

Phylogenetic Likelihood Library
GNU Affero General Public License v3.0
26 stars 6 forks source link

Implement fast unweighted parsimony #117

Closed xflouris closed 7 years ago

xflouris commented 7 years ago

Compute unweighted parsimony score using SSE/AVX instruction (instruction-level parallelism across sites).

  1. Out of n sites, Identify (a) invariant sites (sites made up of a single character and thus contribute 0 to score) and (b) sites that are made up of more than one character but no two characters appear at least twice in the site (in which case the score is the number characters appearing only once, for all kind of topologies).

  2. Replace sites, such that the ones identified in (1) appear at the end of the matrix, and keep an array of original indices (in case it is needed to go back to the original alignment after parsimony computations are finished). This is necessary such that sites that will be computed in parallel are contiguous in memory.

  3. Compute parsimony score for all sites except the ones identified in (1), and add to it the scores of 1b.

xflouris commented 7 years ago

Updated description, started work on this.

stamatak commented 7 years ago

Hi Tomas,

Will this be mostly a re-implementation of my parsimony kernel in LLPLL or will you deploy a different strategy?

Alexis

On 07.12.2016 12:14, Tomas Flouri wrote:

Compute unweighted parsimony score using SSE/AVX instruction (instruction-level parallelism across sites).

1.

Out of n sites, Identify (a) non-informative sites (sites made up of
a single character and thus contribute 0 to score) and (b) sites
that are made up of only two characters and one of them appears only
once in the site (in which case the score is 1 for all kind of
topologies).

2.

Replace sites, such that the ones identified in (1) appear at the
end of the matrix, and keep an array of original indices (in case it
is needed to go back to the original alignment after parsimony
computations are finished). This is necessary such that sites that
will be computed in parallel are contiguous in memory.

3.

Compute parsimony score for all sites except the ones identified in
(1), and add to it the amount of 1b.

Started work on this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/xflouris/libpll/issues/117, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1w-nv7CAx_bN2s-KI7y-rmHi-m4OCXks5rFpUSgaJpZM4LGeS7.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson

www.exelixis-lab.org

xflouris commented 7 years ago

Hi Alexi, the principle will be the same, i.e. decompose the sequences into |states| bit-vectors, and then use the same bit operations to compute the score. The structures will be slightly different though.

stamatak commented 7 years ago

okay :-) don't forget the builtin popcount ;-)

alexis

On 08.12.2016 21:01, Tomas Flouri wrote:

Hi Alexi, the principle will be the same, i.e. decompose the sequences into |states| bit-vectors, and then use the same bit operations to compute the score. The structures will be slightly different though.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xflouris/libpll/issues/117#issuecomment-265839951, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1w-nUC8-2Dx9FLtgpoSSmdRU4uvoIEks5rGGIxgaJpZM4LGeS7.

-- Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson

www.exelixis-lab.org

xflouris commented 7 years ago

finished