sidbdri / cookiecutter-de_analysis_skeleton

Skeleton for new differential expression analysis project.
3 stars 1 forks source link

plot_expression_heatmap() broken by duplicate gene names #218

Closed lweasel closed 1 year ago

lweasel commented 1 year ago

Unfortunately, due to annotation errors, our genes.tsv file sometimes contains more than one gene with the same name. If they both appear in the set of genes to be plotted by plot_expression_heatmap(), then the code breaks because the heatmap doesn't allow duplicate gene names.

A quick and dirty fixed would be something like:

# deal with potential duplicate gene names
  results_tb %<>% group_by(gene_name) %>% slice_head(n = 1) %>% ungroup() 

but there's probably a better way.

hxin commented 1 year ago

so they are two different gene with the same gene name?

lweasel commented 1 year ago

Well...yes, in the annotations there are two different Ensembl IDs which have the same gene name. This is most likely a mistake in the annotations (some of which are produced computationally, not by hand) - i.e. for the one particular one that I looked at, it had been fixed in the latest version of Ensembl. i.e. now it is two different Ensembl IDs with two different gene names.

p.s. code I gave above doesn't quite work cos it messes up the ordering of the heatmap, would need to be something like:

results_tb %<>% group_by(gene_name) %>% slice_head(n = 1) %>% ungroup()  %>% 
  arrange(desc(abs(!!sym(paste0(comparison_name, '.l2fc')))))
hxin commented 1 year ago

I see, I can either use gene_name to search and then use gene_name_ensembl_id as the plot so it will plot 2 heatmap in this case? Or I just take the random first one?

lweasel commented 1 year ago

I think we can just take a random first one. The plot is giving an "overall impression" of the differential expression comparison, so I don't think it really matters to spend too much time worrying about which is the "best" of the two genes in the odd cases like this.

hxin commented 1 year ago

I can add a check to take the higher expression one if that helps?

lweasel commented 1 year ago

Yep, that's a good idea if it's not too difficult.

On 28 Jul 2023 at 15:32:47, Xin He @.***> wrote:

I can add a check to take the higher expression one if that helps?

— Reply to this email directly, view it on GitHub https://github.com/sidbdri/cookiecutter-de_analysis_skeleton/issues/218#issuecomment-1655793365, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUUUBFUZRHV2VHHWV3GA6TXSPEQ7ANCNFSM6AAAAAA23K2VPM . You are receiving this because you authored the thread.Message ID: @.*** com>