maickrau / GraphAligner

MIT License
258 stars 32 forks source link

GraphAligner is not deterministic - Is this fixable #31

Open subwaystation opened 3 years ago

subwaystation commented 3 years ago

Hi @maickrau it seems that GraphAligner is non-deterministic. Small example:

GraphAligner -g cerevisiae.pan.fa.pggb-W-s50000-p90-n5-a0-K16-k8.seqwish-w30000-j5000-e5000-I0.7.smooth.consensus@10\:y.gfa -f cerevisiae.pan.fa -a cerevisiae.pan.fa.pggb-W-s50000-p90-n5-a0-K16-k8.seqwish-w30000-j5000-e5000-I0.7.smooth.consensus@10\:y.gfa_RUN1.gaf -t 16 -x vg
GraphAligner -g cerevisiae.pan.fa.pggb-W-s50000-p90-n5-a0-K16-k8.seqwish-w30000-j5000-e5000-I0.7.smooth.consensus@10\:y.gfa -f cerevisiae.pan.fa -a cerevisiae.pan.fa.pggb-W-s50000-p90-n5-a0-K16-k8.seqwish-w30000-j5000-e5000-I0.7.smooth.consensus@10\:y.gfa_RUN2.gaf -t 16 -x vg

cat cerevisiae.pan.fa.pggb-W-s50000-p90-n5-a0-K16-k8.seqwish-w30000-j5000-e5000-I0.7.smooth.consensus@10\:y.gfa_RUN1.gaf | wc -l
516
cat cerevisiae.pan.fa.pggb-W-s50000-p90-n5-a0-K16-k8.seqwish-w30000-j5000-e5000-I0.7.smooth.consensus@10\:y.gfa_RUN2.gaf | wc -l
518
sha256sum cerevisiae.pan.fa.pggb-W-s50000-p90-n5-a0-K16-k8.seqwish-w30000-j5000-e5000-I0.7.smooth.consensus@10\:y.gfa_RUN1.gaf
3dcf42f9cd399b0370e7e2dff366f63d7b13ced0d1c2cb7fa68d6725d28158aa  cerevisiae.pan.fa.pggb-W-s50000-p90-n5-a0-K16-k8.seqwish-w30000-j5000-e5000-I0.7.smooth.consensus@10:y.gfa_RUN1.gaf
sha256sum cerevisiae.pan.fa.pggb-W-s50000-p90-n5-a0-K16-k8.seqwish-w30000-j5000-e5000-I0.7.smooth.consensus@10\:y.gfa_RUN2.gaf
9e7d1d144a197c65355c126f590aca76c78c9dde782cfcf1591307e18a6721ab  cerevisiae.pan.fa.pggb-W-s50000-p90-n5-a0-K16-k8.seqwish-w30000-j5000-e5000-I0.7.smooth.consensus@10:y.gfa_RUN2.gaf

Link to the data for 10 days: http://fex.belwue.de/fop/vK5Z75vh/issue_GraphAligner_not_deterministic.zip

Is this on purpose? Or is there a way prevent that? A deterministic algorithm would help to better compare results across runs, experiments, etc.

Thanks! Best, Simon

AndreaGuarracino commented 3 years ago

image

jmonlong commented 3 years ago

I'd be interested to know about this too. Do you know if that happens with -t 1? I know the multi-threading can add some nondeterminism sometimes. Not a solution but that might help @maickrau understands

subwaystation commented 3 years ago

Great idea @jmonlong! Running with -t 1 GraphAligner behaves deterministic so far. It seems that the multi-threading is the bad guy here @maickrau.

subwaystation commented 3 years ago

Updated link to data: http://fex.belwue.de/fop/njCHy3QX/issue_GraphAligner_not_deterministic.tar.gz