ucscGenomeBrowser / kent

UCSC Genome Browser source tree. Stable branch: "beta".
http://genome.ucsc.edu/
Other
217 stars 84 forks source link

Floating point error resulting in incorrect liftover results #50

Closed simonbrent closed 3 years ago

simonbrent commented 3 years ago

Hi, I have noticed the following bug when running the liftover executable (and via https://genome.ucsc.edu/cgi-bin/hgLiftOver):

Using the chain file from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz, a bed file whose contents is a single line reading chr10 18122748 18122818, and -minMatch=0.4, I expect a successful liftover.

Instead, the result is an error with the message "Partially deleted in new".

This should be a success because:

The failure occurs because minMatchSize is incorrectly calculated as 28.000000417232513, so intersectSize >= minMatchSize is false. minMatchSize is incorrect because the input minMatch is actually 0.40000000596046448 instead of 0.4.

This can be fixed with by changing minMatch from a float to a double in src/hg/liftOver/liftOver.c. It also looks like minBlocks should have the same change applied. See below for a diff containing the fix:

diff --git a/src/hg/liftOver/liftOver.c b/src/hg/liftOver/liftOver.c
index 6f2a3a5330..2539fb9992 100644
--- a/src/hg/liftOver/liftOver.c
+++ b/src/hg/liftOver/liftOver.c
@@ -32,10 +32,10 @@ static struct optionSpec optionSpecs[] = {
     {"genePred", OPTION_BOOLEAN},
     {"gff", OPTION_BOOLEAN},
     {"hasBin", OPTION_BOOLEAN},
-    {"minBlocks", OPTION_FLOAT},
+    {"minBlocks", OPTION_DOUBLE},
     {"minChainQ", OPTION_INT},
     {"minChainT", OPTION_INT},
-    {"minMatch", OPTION_FLOAT},
+    {"minMatch", OPTION_DOUBLE},
     {"minSizeQ", OPTION_INT},
     {"minSizeT", OPTION_INT},
     {"multiple", OPTION_BOOLEAN},
@@ -188,8 +188,8 @@ double minMatch = LIFTOVER_MINMATCH;
 double minBlocks = LIFTOVER_MINBLOCKS;

 optionInit(&argc, argv, optionSpecs);
-minMatch = optionFloat("minMatch", minMatch);
-minBlocks = optionFloat("minBlocks", minBlocks);
+minMatch = optionDouble("minMatch", minMatch);
+minBlocks = optionDouble("minBlocks", minBlocks);
 fudgeThick = optionExists("fudgeThick");
 multiple = optionExists("multiple");
 noSerial = optionExists("noSerial");
galt commented 3 years ago

Thank you very much for the clean and easy to follow bug report and bug fix! I have incorporated the change in the upstream kent repo. It should appear in v409 release.

simonbrent commented 3 years ago

Great :-)

Can you please let me know when that release has happened? I'd like to be able to download the updated version of the tool

genome-www commented 3 years ago

That release is scheduled for 26 Jan 2021.

On Fri, Jan 8, 2021 at 12:51 AM 'Simon Brent' via UCSC Genome Browser Confidential Support genome-www@soe.ucsc.edu wrote:

Great :-)

Can you please let me know when that release has happened? I'd like to be able to download the updated version of the tool

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ucscGenomeBrowser/kent/issues/50#issuecomment-756632311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQIUREESCQOO2LJ4I5BG5A3SY3BRVANCNFSM4VUUMK5Q .

-- To unsubscribe from this group and stop receiving emails from it, send an email to genome-www+unsubscribe@soe.ucsc.edu.

simonbrent commented 3 years ago

Thanks a lot