FeatureHeatmap plot masking cells

ghoshal commented 6 years ago

Hi,

I am using FeatureHeatmap to visualize gene expression between two groups. When I plot, the cells with no expression are masking the cells that have an expression as you can see on the left side. Is there a way to not let this happen. FeatureHeatmap(object = crypts_hi_res, features.plot = c("Clu", "Lgr5", "Mex3a", "Yap1"), group.by = "class", sep.scale = F, pt.size = 1.5, cols.use = c("gray97", "blue"), pch.use = 1)

Also, when we are plotting gene expression in either FeaturePlot or FeatureHeatmap, are we plotting the scaled log normalized data?

Thanks,

Bibaswan

leonfodoulian commented 6 years ago

Hi,

You can decrease the size of your points, and use pch.use = 20 to get smaller dots.

Regarding the plotted data, FeaturePlot() plots the log-normalized data, whereas FeatureHeatmap() plots scaled data.

Best, Leon

ghoshal commented 6 years ago

Hi Leon,

Thanks for your response. I used the recommended parameters but the contrast is still very poor. I was thinking of putting a min.cutoff to remove the lowly expressed cells. I couldn't find the cutoff format in the documentation. Can you please help me with that?

I want to select cells with an expression of >2.

Thanks,

Biba

ghoshal commented 6 years ago

Also, is there a way to plot a group of genes belonging to one pathway in a single feature plot or heatmap based on their average/median expression?

leonfodoulian commented 6 years ago

Hi,

I wrote a script that tries to do two things at once. It allows either to add a transparency to all your points, which you can define by setting a value below 1 to alpha.use. This option will help you keep all your cells in the plots, but colours become difficult to read in this case. Or, you can set a gene expression cutoff by passing the desired cutoff value to expression.threshold. In this latter case, cells having expression values < the cutoff will be removed from the plots. (You can also combine both options.) Provided that log transformed (using log(x + 1)) normalized gene expression levels are plotted, the script assumes that the cutoff value passed to expression.threshold is not log transformed (is.log1p.transformed = F), and the value is log transformed using the default settings of the function. If you want the cutoff value to be applied on the log transformed data, you simply have to set is.log1p.transformed = TRUE.

# Function to pseudo customize FeaturePlots
customize_Seurat_FeaturePlot <- function(p, alpha.use = 1, gradient.use = c("yellow", "red"), expression.threshold = NULL, is.log1p.transformed = F) {

  #### Main function ####
  main_function <- function(p = p, alpha.use = alpha.use, gradient.use = gradient.use, expression.threshold = expression.threshold, is.log1p.transformed = is.log1p.transformed) {

    # Remove cells having an expression below a certain threshold
    if (!is.null(expression.threshold)) {
      if (isTRUE(is.log1p.transformed)) {
        p$data <- p$data[p$data$gene >= expression.threshold,]
      } else {
        p$data <- p$data[p$data$gene >= log1p(expression.threshold),]
      }
    }

    # Fill points using the gene expression levels
    p$layers[[1]]$mapping$fill <- p$layers[[1]]$mapping$colour

    # Define transparency of points
    p$layers[[1]]$mapping$alpha <- alpha.use

    # Change fill and colour gradient values
    p <- p + scale_colour_gradientn(colours = gradient.use, guide = F) +
      scale_fill_gradientn(colours = gradient.use, name = expression(atop(Expression, (log)))) +
      scale_alpha_continuous(range = alpha.use, guide = F)
  }

  #### Execution of main function ####
  # Apply main function on all features
  p <- lapply(X = p, alpha.use = alpha.use, gradient.use = gradient.use, 
              expression.threshold = expression.threshold, is.log1p.transformed = is.log1p.transformed,
              FUN = main_function)

  # Arrange all plots using cowplot
  # Adapted from Seurat
  # https://github.com/satijalab/seurat/blob/master/R/plotting.R#L1100
  # ncol argument from Josh O'Brien
  # https://stackoverflow.com/questions/10706753/how-do-i-arrange-a-variable-list-of-plots-using-grid-arrange
  cowplot::plot_grid(plotlist = p, ncol = ceiling(sqrt(length(p))))
}

# Plot features using FeaturePlot, and return individual plots using 'do.return = T'
# Store output in 'plot'
plot <- FeaturePlot(object = object, features.plot = c("Actb", "Rps9"), cols.use = c("yellow", "red"), 
                    pt.size = 2, pch.use = 16, no.legend = F, do.return = T)

# Remove from plots all cells having an expression level below 2 (cutoff not log1p transformed)
customize_Seurat_FeaturePlot(p = plot, expression.threshold = 2)

Hope that helps!

Best, Leon

leonfodoulian commented 6 years ago

Regarding your second question, I am not aware of a Seurat function that can do that for you. I don't think that it is also common and good practice to compute median expression values of multiple genes for a given cell. In this case, you might want to use the scaled data from Seurat, and compute the median scaled expression levels of multiple genes. However, this is a completely different topic than what this issue was about.

Best, Leon

leonfodoulian commented 6 years ago

Hi again,

Sorry but I forgot that you were using FeatureHeatmap(). The previous function works fine for FeaturePlot(), but not for FeatureHeatmap(). You can use the function below instead. There might be other ways of doing it too, or probably just integrate some of these functionalities in later releases of Seurat. But I agree that this is quite a particular case.

# Function to pseudo customize FeatureHeatmap
customize_Seurat_FeatureHeatmap <- function(p, alpha.use = 1, gradient.use = c("yellow", "red"), expression.threshold = 0, is.log1p.transformed = F) {

  # Remove cells having an expression below a certain threshold
  if (isTRUE(is.log1p.transformed)) {
    p$data <- p$data[p$data$expression >= expression.threshold,]
  } else {
    p$data <- p$data[p$data$expression >= log1p(expression.threshold),]
  }

  # Fill points using the gene expression levels
  p$layers[[1]]$mapping$fill <- p$layers[[1]]$mapping$colour

  # Define transparency of points
  p$layers[[1]]$mapping$alpha <- alpha.use

  # Change fill and colour gradient values
  p <- p + scale_colour_gradientn(colours = gradient.use, guide = F) +
    scale_fill_gradientn(colours = gradient.use, name = expression(atop(Scaled, expression))) +
    scale_alpha_continuous(range = alpha.use, guide = F)

  # Return plot
  p
}

# Plot features using FeatureHeatmap, and return plot using 'do.return = T'
# Store output in 'plot'
plot <- FeatureHeatmap(object = crypts_hi_res, features.plot = c("Clu", "Lgr5", "Mex3a", "Yap1"), group.by = "class", sep.scale = F, pt.size = 1.5, cols.use = c("gray97", "blue"), pch.use = 1)

# Remove from plot all cells having an expression level below 2 (cutoff not log1p transformed)
# Filter is applied on log normalized data, but scaled data is plotted
customize_Seurat_FeatureHeatmap(plot, gradient.use = c("gray97", "blue"), expression.threshold = 2)

Best, Leon

ghoshal commented 6 years ago

Thanks a lot Leon. You have been a great help.

ghoshal commented 6 years ago

Hi Leon,

Both the customized plots are working well. The only thing is that the background cells (ones without any expression of the specified genes) are disappearing from the plots. It is important for me to keep the tSNE plot whole to make the comparisons between the clusters. Can you please tell me what I can do to plot the background as well?

Thanks,

Biba

leonfodoulian commented 6 years ago

Hello,

Could you please share the code you are using? Providing also a plot before and after the operation you are doing using the script I wrote would be also helpful for me to troubleshoot your issue.

Best, Leon

leonfodoulian commented 6 years ago

I understand. My bad! I wrote you a script that allows you to filter out the cells that have an expression level of the gene below the threshold that you specify using the expression.threshold argument. In fact, what you want to do is far simpler than what I thought you want to do. In this case, you can simply reorder the data that is being plotted by the expression level of the genes, and plot the reordered data.

#### For FeatureHeatmap() ####
# Plot features using FeatureHeatmap, and return plot using 'do.return = T'
# Store output in 'plot'
plot <- FeatureHeatmap(object = crypts_hi_res, features.plot = c("Clu", "Lgr5", "Mex3a", "Yap1"), group.by = "class", sep.scale = F, pt.size = 1.5, cols.use = c("gray97", "blue"), pch.use = 1)

# Order data by gene expresion level
plot$data <- plot$data[order(plot$data$expression),]

# plot with reordered data
plot

#### For FeaturePlot() ####
# Plot features using FeaturePlot, and return individual plots using 'do.return = T'
# Store output in 'plot'
plot <- FeaturePlot(object = object, features.plot = c("Actb", "Rps9"), cols.use = c("yellow", "red"), 
                    pt.size = 2, pch.use = 16, no.legend = F, do.return = T)

# For each element of plot, reorder the data and return plot
plot <- lapply(X = plot, function(p) { p$data <- p$data[order(p$data$gene),]; p})

# Arrange all plots using cowplot
# Adapted from Seurat
# https://github.com/satijalab/seurat/blob/master/R/plotting.R#L1100
# ncol argument adapted from Josh O'Brien
# https://stackoverflow.com/questions/10706753/how-do-i-arrange-a-variable-list-of-plots-using-grid-arrange
cowplot::plot_grid(plotlist = plot, ncol = ceiling(sqrt(length(plot))))

Hope that helps!

Best, Leon

ghoshal commented 6 years ago

Hi Leon,

The plots were working as needed by using the customized scripts that you had sent. I was intending to use an expression cutoff that I could do using your script. The ordering of the data did not particularly improve on what I wanted. Here is an example.

This is the original command. x <- FeatureHeatmap(object = crypts, features.plot = c("Clu", "Lgr5", "Mex3a", "Yap1"), group.by = "class", sep.scale = T, pt.size = 0.5, cols.use = c("gray98", "red"), pch.use = 20, do.return = T)

x$data <- x$data[order(x$data$expression),] x plot_with_ordered_expression

Then, I run your expression.threshhold using the following command and get this.

customize_Seurat_FeatureHeatmap(x, alpha.use = 0.8,gradient.use = c("orangered", "red4"), expression.threshold = 1, is.log1p.transformed = T)

plot_with_expression_threshold

This is really and what I actually want to see at the gene level FeatureHeatmap. However, I would want the background tSNE plot to be there as well. So technically all cells with 0 expressions in the x$data will have a solid colour, so that the final plot looks something like this.

mki67 combined

Thanks,

Biba

leonfodoulian commented 6 years ago

Hello,

Here you go.

#### For FeaturePlot() ####
# Function to pseudo customize FeaturePlots
customize_Seurat_FeaturePlot <- function(p, alpha.use = 1, gradient.use = c("yellow", "red"), expression.threshold = 0, is.log1p.transformed = F) {

  #### Main function ####
  main_function <- function(p = p, alpha.use = alpha.use, gradient.use = gradient.use, expression.threshold = expression.threshold, is.log1p.transformed = is.log1p.transformed) {

    # Order data by gene expresion level
    p$data <- p$data[order(p$data$gene),]

    # Define lower limit of gene expression level
    if (isTRUE(is.log1p.transformed)) {
      expression.threshold <- expression.threshold
    } else {
      expression.threshold <- log1p(expression.threshold)
    }

    # Compute maximum value in gene expression
    max.exp <- max(p$data$gene)

    # Fill points using the gene expression levels
    p$layers[[1]]$mapping$fill <- p$layers[[1]]$mapping$colour

    # Define transparency of points
    p$layers[[1]]$mapping$alpha <- alpha.use

    # Change fill and colour gradient values
    p <- p + scale_colour_gradientn(colours = gradient.use, guide = F, limits = c(expression.threshold, max.exp), na.value = "grey") +
      scale_fill_gradientn(colours = gradient.use, name = expression(atop(Expression, (log))), limits = c(expression.threshold, max.exp), na.value = "grey") +
      scale_alpha_continuous(range = alpha.use, guide = F)
  }

  #### Execution of main function ####
  # Apply main function on all features
  p <- lapply(X = p, alpha.use = alpha.use, gradient.use = gradient.use, 
              expression.threshold = expression.threshold, is.log1p.transformed = is.log1p.transformed,
              FUN = main_function)

  # Arrange all plots using cowplot
  # Adapted from Seurat
  # https://github.com/satijalab/seurat/blob/master/R/plotting.R#L1100
  # ncol argument adapted from Josh O'Brien
  # https://stackoverflow.com/questions/10706753/how-do-i-arrange-a-variable-list-of-plots-using-grid-arrange
  cowplot::plot_grid(plotlist = p, ncol = ceiling(sqrt(length(p))))
}

#### For FeatureHeatmap() ####
# Function to pseudo customize FeatureHeatmap
customize_Seurat_FeatureHeatmap <- function(p, alpha.use = 1, gradient.use = c("yellow", "red"), scaled.expression.threshold = NULL, is.log1p.transformed = F) {

  # Order data by gene expresion level
  p$data <- p$data[order(p$data$scaled.expression),]

  # Define lower limit of scaled gene expression level
  if (!is.null(scaled.expression.threshold)) {
    if (isTRUE(is.log1p.transformed)) {
      scaled.expression.threshold <- scaled.expression.threshold
    } else {
      scaled.expression.threshold <- log1p(scaled.expression.threshold)
    }
  } else if (is.null(scaled.expression.threshold)) {
    scaled.expression.threshold <- min(p$data$scaled.expression)
  }

  # Compute maximum value in gene expression
  max.scaled.exp <- max(p$data$scaled.expression)

  # Fill points using the gene expression levels
  p$layers[[1]]$mapping$fill <- p$layers[[1]]$mapping$colour

  # Define transparency of points
  p$layers[[1]]$mapping$alpha <- alpha.use

  # Change fill and colour gradient values
  p <- p + scale_colour_gradientn(colours = gradient.use, guide = F, limits = c(scaled.expression.threshold, max.scaled.exp), na.value = "grey") +
    scale_fill_gradientn(colours = gradient.use, name = expression(atop(Scaled, expression)), limits = c(scaled.expression.threshold, max.scaled.exp), na.value = "grey") +
    scale_alpha_continuous(range = alpha.use, guide = F)

  # Return plot
  p
}

Note that for FeatureHeatmap() you filter for scaled expression. Also, scaled.expression.threshold can be NULL (in which case it will take the lowest scaled expression level), whereas expression.threshold requires a minimum value of 0.

Hope that helps!

Best, Leon

leonfodoulian commented 6 years ago

My bad! For FeatureHeatmap(), you no longer need the is.log1p.transformed argument.

#### For FeatureHeatmap() ####
# Function to pseudo customize FeatureHeatmap
customize_Seurat_FeatureHeatmap <- function(p, alpha.use = 1, gradient.use = c("yellow", "red"), scaled.expression.threshold = NULL) {

  # Order data by scaled gene expresion level
  p$data <- p$data[order(p$data$scaled.expression),]

  # Define lower limit of scaled gene expression level
  if (!is.null(scaled.expression.threshold)) {
    scaled.expression.threshold <- scaled.expression.threshold
  } else if (is.null(scaled.expression.threshold)) {
    scaled.expression.threshold <- min(p$data$scaled.expression)
  }

  # Compute maximum value in gene expression
  max.scaled.exp <- max(p$data$scaled.expression)

  # Fill points using the scaled gene expression levels
  p$layers[[1]]$mapping$fill <- p$layers[[1]]$mapping$colour

  # Define transparency of points
  p$layers[[1]]$mapping$alpha <- alpha.use

  # Change fill and colour gradient values
  p <- p + scale_colour_gradientn(colours = gradient.use, guide = F, limits = c(scaled.expression.threshold, max.scaled.exp), na.value = "grey") +
    scale_fill_gradientn(colours = gradient.use, name = expression(atop(Scaled, expression)), limits = c(scaled.expression.threshold, max.scaled.exp), na.value = "grey") +
    scale_alpha_continuous(range = alpha.use, guide = F)

  # Return plot
  p
}

ghoshal commented 6 years ago

Thanks a lot Leon. They work like dream.

CodeInTheSkies commented 5 years ago

Hi @ghoshal,

I had a similar situation where I needed to bring the points that correspond to high expression to the front in a feature plot. I tried the solutions above in this and the other thread (issue #757), but ultimately, what worked best for me is to directly re-order the "$data" data frame in increasing order of gene expressions.

Here is an example code snippet:

# First, the usual command to make the feature plots and return the ggplot2 objects
fp <- Seurat::FeaturePlot(object=p, features.plot=GeneList, cols.use = c("gray75", "red"), 
                          min.cutoff="q25",max.cutoff="q75", reduction.use = "tsne", 
                          do.return=TRUE)

# Here, fp is a list of objects, one per feature plot

# Then, re-order the "data" dataframe in increasing order of gene expressions
for (i in 1:length(fp)){
  fp[[i]]$data <- fp[[i]]$data[order(fp[[i]]$data$gene),]
}

This did the trick for me! That is, I got all the gray points relegated to the back. Among the non-gray ones, the brightest red points were at the very front, and other shades of red at intermediate positions. If a dull red point and a bright red point overlap, then the bright red one would be at the front, which is exactly what I wanted.

Hope this helps. Sample plots attached below.

BEFORE

AFTER

leonfodoulian commented 5 years ago

Hi @CodeInTheSkies,

The principle is exactly the same in both cases.

Best, Leon

samuel-marsh commented 5 years ago

Hi @leonfodoulian, Wondering about whether it's possible to tweak the function to continue even if gene isn't found in the dataset. I'm working with dataset that is just single cell type so many genes aren't expressed. Would like to query my dataset against gene list from another publication but function stops when gene in the list isn't found in my Seurat object.

Thanks! Sam

crazyhottommy commented 5 years ago

Hello,

Here you go.

#### For FeaturePlot() ####
# Function to pseudo customize FeaturePlots
customize_Seurat_FeaturePlot <- function(p, alpha.use = 1, gradient.use = c("yellow", "red"), expression.threshold = 0, is.log1p.transformed = F) {

  #### Main function ####
  main_function <- function(p = p, alpha.use = alpha.use, gradient.use = gradient.use, expression.threshold = expression.threshold, is.log1p.transformed = is.log1p.transformed) {

    # Order data by gene expresion level
    p$data <- p$data[order(p$data$gene),]

    # Define lower limit of gene expression level
    if (isTRUE(is.log1p.transformed)) {
      expression.threshold <- expression.threshold
    } else {
      expression.threshold <- log1p(expression.threshold)
    }

    # Compute maximum value in gene expression
    max.exp <- max(p$data$gene)

    # Fill points using the gene expression levels
    p$layers[[1]]$mapping$fill <- p$layers[[1]]$mapping$colour

    # Define transparency of points
    p$layers[[1]]$mapping$alpha <- alpha.use

    # Change fill and colour gradient values
    p <- p + scale_colour_gradientn(colours = gradient.use, guide = F, limits = c(expression.threshold, max.exp), na.value = "grey") +
      scale_fill_gradientn(colours = gradient.use, name = expression(atop(Expression, (log))), limits = c(expression.threshold, max.exp), na.value = "grey") +
      scale_alpha_continuous(range = alpha.use, guide = F)
  }

  #### Execution of main function ####
  # Apply main function on all features
  p <- lapply(X = p, alpha.use = alpha.use, gradient.use = gradient.use, 
              expression.threshold = expression.threshold, is.log1p.transformed = is.log1p.transformed,
              FUN = main_function)

  # Arrange all plots using cowplot
  # Adapted from Seurat
  # https://github.com/satijalab/seurat/blob/master/R/plotting.R#L1100
  # ncol argument adapted from Josh O'Brien
  # https://stackoverflow.com/questions/10706753/how-do-i-arrange-a-variable-list-of-plots-using-grid-arrange
  cowplot::plot_grid(plotlist = p, ncol = ceiling(sqrt(length(p))))
}

#### For FeatureHeatmap() ####
# Function to pseudo customize FeatureHeatmap
customize_Seurat_FeatureHeatmap <- function(p, alpha.use = 1, gradient.use = c("yellow", "red"), scaled.expression.threshold = NULL, is.log1p.transformed = F) {

  # Order data by gene expresion level
  p$data <- p$data[order(p$data$scaled.expression),]

  # Define lower limit of scaled gene expression level
  if (!is.null(scaled.expression.threshold)) {
    if (isTRUE(is.log1p.transformed)) {
      scaled.expression.threshold <- scaled.expression.threshold
    } else {
      scaled.expression.threshold <- log1p(scaled.expression.threshold)
    }
  } else if (is.null(scaled.expression.threshold)) {
    scaled.expression.threshold <- min(p$data$scaled.expression)
  }

  # Compute maximum value in gene expression
  max.scaled.exp <- max(p$data$scaled.expression)

  # Fill points using the gene expression levels
  p$layers[[1]]$mapping$fill <- p$layers[[1]]$mapping$colour

  # Define transparency of points
  p$layers[[1]]$mapping$alpha <- alpha.use

  # Change fill and colour gradient values
  p <- p + scale_colour_gradientn(colours = gradient.use, guide = F, limits = c(scaled.expression.threshold, max.scaled.exp), na.value = "grey") +
    scale_fill_gradientn(colours = gradient.use, name = expression(atop(Scaled, expression)), limits = c(scaled.expression.threshold, max.scaled.exp), na.value = "grey") +
    scale_alpha_continuous(range = alpha.use, guide = F)

  # Return plot
  p
}

Note that for FeatureHeatmap() you filter for scaled expression. Also, scaled.expression.threshold can be NULL (in which case it will take the lowest scaled expression level), whereas expression.threshold requires a minimum value of 0.

Hope that helps!

Best, Leon

with Seurat V3, this is not working.

where the data slot in the ggplot object now? p<- FeaturePlot(seurat, features = c("Vgf", "Tac1"))

there is no p$data$gene

p$data$

Thank you!

kmshort commented 4 years ago

with Seurat V3, this is not working.

where the data slot in the ggplot object now? p<- FeaturePlot(seurat, features = c("Vgf", "Tac1"))

there is no p$data$gene

p$data$

Thank you!

In Seurat v3, FeaturePlot now has baked in tools for ordering and filtering (so you don't need these functions for these plotting features): order = FALSE (default) set as TRUE will plot higher values on top of lower values.. doing the ordering for you. min.cutoff = NA (default), set as a number to remove cells below a certain expression level. max.cutoff = NA (default), set as a number to remove cells above a certain expression level.

The only thing it doesn't implement now is ALPHA. Which would be awesome.

satijalab / seurat

FeatureHeatmap plot masking cells #235

there is no p$data$gene

there is no p$data$gene