wrpearson / fasta36

Git repository for FASTA36 sequence comparison software
Apache License 2.0
117 stars 16 forks source link

Effective search space #59

Open rmhubley opened 9 months ago

rmhubley commented 9 months ago

Curious as to why the effective search space for a DNA/DNA search doesn't change when switching from plus-strand only to both strands for "Altshul/Gish" statistics. Is this expected?

# MLE Statistics
# BOTH strands
/usr/local/fasta/fasta36 -n L2a.fa bin43.fa
Top Hit:  chrXfrag-2545 [f] **E=0.016**
# PLUS strand only
/usr/local/fasta/fasta36 -3 -n L2a.fa bin43.fa
Top Hit (same alignment): chrXfrag-2545 [f] **E=0.00085**

# Altshul/Gish
# BOTH strands
/usr/local/fasta/fasta36 -z 3 -n L2a.fa bin43.fa
Top Hit (same alignment): chrXfrag-2545 [f] **E=5.9e-124**
# PLUS strand only
/usr/local/fasta/fasta36 -z 3 -3 -n L2a.fa bin43.fa
Top Hit (same alignment): chrXfrag-2545 [f] **E=5.9e-124**
wrpearson commented 9 months ago

This is likely to be bug. The Altschul-Gish statistics strategy is quite different from the others (which are estimated from the data), and I have not tested it extensively.

Bill Pearson

Begin forwarded message:

Curious as to why the effective search space for a DNA/DNA search doesn't change when switching from plus-strand only to both strands for "Altshul/Gish" statistics. Is this expected?

MLE Statistics

BOTH strands

/usr/local/fasta/fasta36 -n L2a.fa bin43.fa Top Hit: chrXfrag-2545 [f] E=0.016

PLUS strand only

/usr/local/fasta/fasta36 -3 -n L2a.fa bin43.fa Top Hit (same alignment): chrXfrag-2545 [f] E=0.00085

Altshul/Gish

BOTH strands

/usr/local/fasta/fasta36 -z 3 -n L2a.fa bin43.fa Top Hit (same alignment): chrXfrag-2545 [f] E=5.9e-124

PLUS strand only

/usr/local/fasta/fasta36 -z 3 -3 -n L2a.fa bin43.fa Top Hit (same alignment): chrXfrag-2545 [f] E=5.9e-124

— Reply to this email directly, view it on GitHubhttps://github.com/wrpearson/fasta36/issues/59, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABQYNP3LZI6ZDGETA56DCGDYO3FBBAVCNFSM6AAAAABB5J2XFOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4DINRSGIYDAOA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

rmhubley commented 9 months ago

Thank you for the quick response. I was motivated to try this when I noticed that NCBI blastn exhibited this behavior.