wrpearson / fasta36

Git repository for FASTA36 sequence comparison software
Apache License 2.0
117 stars 16 forks source link

ssearch36 multithreaded output not consistent #36

Closed kenietz closed 3 years ago

kenietz commented 3 years ago

Hi,

would like to report that when i use more than 2 threads and option '-m B' or '-m BB' with SSEARCH36 (36.3.8), i am getting different number of hits every time. Sometimes i get for example 524 hits, sometimes 523.

I am using the following command line: 'ssearch36 -T 8 -s BL62 -E 10 -m B -O OUTPUT_FILE IN.fa LIBRARY.fa'

After some testing i think the problem is coming from option '-m'.

Is it possible to avoid this behavior? Or maybe already resolved on newer versions?

Best regards Dimitar

kenietz commented 3 years ago

Hi again,

i think i figured it out. It was my library.fa which was producing the strange variation of outputs. After clustering the library and then perform the search with this library, output is consistent.

Best regards Dimitar

wrpearson commented 3 years ago

This I’d expected behavior from the FASTA programs, including SSEARCH. Basically, most of the statistical estimation options sample a subset (60,000) of all the similarity scores calculated during the search, and estimate the parameters used to calculate the statistical significance and bit score. This has nothing to do with the n7mber of threads or output format.

This can be confusing when the results from one run to 5he nex5 vary slightly, but it is the expected behavior and reinforces the fact 5hat the statistical estimates are just that, estimates, and may produce slightly different values.

Bill Pearson

On Sep 4, 2021, at 11:07 PM, kenietz @.***> wrote:



Hi,

would like to report that when i use more than 2 threads and option '-m B' or '-m BB' with SSEARCH36 (36.3.8), i am getting different number of hits every time. Sometimes i get for example 524 hits, sometimes 523.

I am using the following command line: 'ssearch36 -T 8 -s BL62 -E 10 -m B -O OUTPUT_FILE IN.fa LIBRARY.fa'

After some testing i think the problem is coming from option '-m'.

Is it possible to avoid this behavior? Or maybe already resolved on newer versions?

Best regards Dimitar

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/wrpearson/fasta36/issues/36, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQYNPYEM3H3LE2GR7LTR5DUAL3KPANCNFSM5DOE4SYQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

kenietz commented 3 years ago

Thank you for your reply.

yes, i kinda figured out it might be a sampling effect as well. So after clustering my LIBRARY.fa was below 60k sequences and the search result was stable between runs.

This I’d expected behavior from the FASTA programs, including SSEARCH. Basically, most of the statistical estimation options sample a subset (60,000) of all the similarity scores calculated during the search, and estimate the parameters used to calculate the statistical significance and bit score. This has nothing to do with the n7mber of threads or output format. This can be confusing when the results from one run to 5he nex5 vary slightly, but it is the expected behavior and reinforces the fact 5hat the statistical estimates are just that, estimates, and may produce slightly different values. Bill Pearson On Sep 4, 2021, at 11:07 PM, kenietz @.***> wrote:  Hi, would like to report that when i use more than 2 threads and option '-m B' or '-m BB' with SSEARCH36 (36.3.8), i am getting different number of hits every time. Sometimes i get for example 524 hits, sometimes 523. I am using the following command line: 'ssearch36 -T 8 -s BL62 -E 10 -m B -O OUTPUT_FILE IN.fa LIBRARY.fa' After some testing i think the problem is coming from option '-m'. Is it possible to avoid this behavior? Or maybe already resolved on newer versions? Best regards Dimitar — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#36>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQYNPYEM3H3LE2GR7LTR5DUAL3KPANCNFSM5DOE4SYQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.