thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
572 stars 64 forks source link

Incorrect labels for scale with large sequences #182

Open YannDussert opened 2 months ago

YannDussert commented 2 months ago

Hi,

Thanks for developing gggenomes!

I came across a small bug when trying to plot "large" sequences with the package, where the labels on the bottom scale were wrong (eg., 2M instead of 1.5M or 2.5M). Here is a small example:

s0 <- tibble::tibble( seq_id = c("a", "b", "c"), length = c(1000000, 1500000, 2500000) )

gggenomes(seqs = s0) + geom_seq()

image

The labels are correct with sequences lengths of 100k/150k/250k or 10M/15M/25M, so I don't really know where the issue could come from.

Best regards, Yann

thackl commented 1 month ago

Ah, interesting catch. Agreed, not ideal. Internally, I use gggenomes::label_bp which is derived from scales::label_bytes. It uses a fixed accuracy, which by default is set to 1. That means that there will be no digits for any value.

A kind of a workaround is to set accuracy=0.1, but then all labels have digits...

s0 <- tibble::tibble( seq_id = c("a", "b", "c"), length = c(1000000, 1500000, 2500000) )
gggenomes(seqs = s0) +
 geom_seq() +
 scale_x_bp(accuracy=0.1)

image

Need to ponder some more for a good solution