sharkfin plot too many data points and difficult to visualize

Hi, I was trying to make the sharkfin plot as you discussed in another issue. However, the shape of my plot doesn't match the shape you showed in the paper and the plot looks like the one shown below. These are 35K data points or perhaps you recommend splitting the result dataframe by transcript ID (i.e. ref_id Column). This is Sars-CoV-2

## Because ggplot doesn't like NAs
df<-file[,c("ref_id","ref_kmer","GMM_logit_pvalue_context_2","Logit_LOR")] %>% tidyr::drop_na()

df$Logit_LOR<- abs(df$Logit_LOR)

df<-df[order(df$GMM_logit_pvalue_context_2, df$Logit_LOR),]

df$color<-ifelse(df$GMM_logit_pvalue_context_2 <0.05 & df$Logit_LOR > 0.5 ,"Significant","Not-significant")

df$GMM_logit_pvalue_context_2<- -log10(df$GMM_logit_pvalue_context_2)
ggplot(df, aes(x=Logit_LOR, y=GMM_logit_pvalue_context_2,color=color)) + geom_point()+theme_minimal()+xlab("Logistic regression odds ratio")+ylab( "Nanocompore p-value (-log10)")

tleonardi / nanocompore

sharkfin plot too many data points and difficult to visualize #228