myles-lewis / locuszoomr

A pure R implementation of locuszoom for plotting genetic data at genomic loci accompanied by gene annotations.
GNU General Public License v3.0
18 stars 5 forks source link

Some small fixes #8

Closed twillis209 closed 9 months ago

twillis209 commented 9 months ago

Hi Myles

I love locuszoomr, thanks for writing it. I've made several small changes to fix some errors I've encountered when using the package over the last couple of weeks, I hope they are of some help. I've written test cases where I could.

I encountered the following error when there were no SNPs in a locus:

Error: near ")": syntax error

I've added a check and stop call for this to locus:

if(nrow(data) == 0) {
    stop("Locus contains no SNPs/datapoints")
}

When two or more genes in the Ensembl data base match the gene argument for locus, it causes an error in mapRow:

Error in `$<-.data.frame`(`*tmp*`, "tmin", value = c(16830925.2661651,  : 
  replacement has 3 rows, data has 1
Calls: gg_genetracks -> mapRow -> $<- -> $<-.data.frame
Execution halted

To fix this I added some code to locus to default to the first gene and to warn the user:

if(length(locus) > 1) {
    warning(sprintf('Identified %d genes matching name \'%s\', taking first\n', length(locus), gene))
    locus <- locus[1]  
}

When plotting the HLA-A locus, Ensembl returned 8 sequences, one with the expected seqname 6 and others with odd ones which I assume represent the same gene on alternative haplotypes, e.g. CHR_HSCHR6_MHC_APD_CTG1. The problem of multiple hits should be fixed by the previous fix, but I also added an additional AnnotationFilter to exclude these sequences:

locus <- genes(edb, filter = AnnotationFilterList(
      GeneNameFilter(gene),
      SeqNameFilter(c(1:22, 'X', 'Y'))
      )
    )

This is an opinionated addition, though, and I could see why you might not want to merge it in. Perhaps an optional parameter to restrict the search for genes to certain sequences such as the canonical chromosomes?

When I searched in a region with no transcripts, I encountered this cryptic error again:

Error: near ")": syntax error

I added code create an empty exons object when there were no transcripts (I think this was necessary to avoid the error).

TX <- ensembldb::genes(edb, filter = AnnotationFilterList(
          SeqNameFilter(seqname),
          TxStartFilter(xrange[2], condition = "<"),
          TxEndFilter(xrange[1], condition = ">"),
          GeneIdFilter("ENSG", "startsWith")))
TX <- data.frame(TX)
TX <- TX[! is.na(TX$start), ]

TX <- TX[!duplicated(TX$gene_id), ]

if(nrow(TX) == 0) {
  # Creating empty exons object here in suitable format
 EX <- ensembldb::exons(edb, filter = AnnotationFilterList(
  SeqNameFilter(seqname),
  ExonStartFilter(xrange[2], condition = "<"),
  ExonEndFilter(xrange[1], condition = ">"),
  GeneIdFilter("ENSG", "startsWith")))
} else {
 EX <- ensembldb::exons(edb, filter = GeneIdFilter(TX$gene_id))
}

I also modified the logic to restrict the search for transcripts to seqname and removed duplicate transcripts in the TX object; perhaps there are circumstances in which you'd like to retain these?

There's also a test for gg_scatter relating to the factor levels issue you fixed today, I fixed it myself, too.

myles-lewis commented 9 months ago

Hi Tom,

Thanks for this and glad to have your input! These are really useful edge/bug cases. Currently I'm planning to:

All the best, Myles

twillis209 commented 9 months ago

All done, I think. Despite doing this for years I've never gone through the PR process on GitHub, so please let me know if I've missed something.

myles-lewis commented 9 months ago

Hi Tom, Thanks for your help. I merged your changes.

I have pushed some further modifications to the main branch to change behaviour so that locus_plot(), locus_ggplot() and locus_plotly() all produce complete locus plots just with blank gene tracks. They still produce a console message “no gene tracks”. I think this behaviour is reasonably intuitive, but happy to hear your thoughts.

All the best, Myles

From: Tom Willis @.> Date: Thursday, 14 December 2023 at 16:58 To: myles-lewis/locuszoomr @.> Cc: Myles Lewis @.>, Comment @.> Subject: Re: [myles-lewis/locuszoomr] Some small fixes (PR #8)

All done, I think. Despite doing this for years I've never gone through the PR process on GitHub, so please let me know if I've missed something.

— Reply to this email directly, view it on GitHubhttps://github.com/myles-lewis/locuszoomr/pull/8#issuecomment-1856210242, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF6OE6EFD4GUBBSGAZUWSODYJMV4PAVCNFSM6AAAAABAUA2OY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJWGIYTAMRUGI. You are receiving this because you commented.Message ID: @.***>

twillis209 commented 9 months ago

Sounds good to me! The 'failure' will be obvious to the user from the end result.