stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
325 stars 87 forks source link

CoveragePlot: can not visualize the gene #656

Closed pengxin2019 closed 3 years ago

pengxin2019 commented 3 years ago
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)

Matrix products: default
BLAS/LAPACK: /opt/rit/spack-app/linux-rhel7-x86_64/gcc-7.3.0/openblas-0.3.10-cbsvgq24lsaf6xz65lbdlb6jh4b7qsaa/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] AnnotationHub_2.22.1 BiocFileCache_1.14.0 dbplyr_2.1.0        
 [4] patchwork_1.1.1      ggplot2_3.3.3        GenomeInfoDb_1.26.4 
 [7] IRanges_2.24.1       S4Vectors_0.28.1     BiocGenerics_0.36.0 
[10] SeuratObject_4.0.0   Seurat_4.0.2         Signac_1.2.1        

loaded via a namespace (and not attached):
  [1] fastmatch_1.1-0               plyr_1.8.6                   
  [3] igraph_1.2.6                  lazyeval_0.2.2               
  [5] splines_4.0.3                 BiocParallel_1.24.1          
  [7] listenv_0.8.0                 scattermore_0.7              
  [9] SnowballC_0.7.0               digest_0.6.27                
 [11] htmltools_0.5.1.1             fansi_0.4.2                  
 [13] magrittr_2.0.1                memoise_2.0.0                
 [15] tensor_1.5                    cluster_2.1.1                
 [17] ROCR_1.0-11                   globals_0.14.0               
 [19] Biostrings_2.58.0             matrixStats_0.58.0           
 [21] docopt_0.7.1                  spatstat.sparse_2.0-0        
 [23] colorspace_2.0-0              blob_1.2.1                   
 [25] rappdirs_0.3.3                ggrepel_0.9.1                
 [27] dplyr_1.0.5                   sparsesvd_0.2                
 [29] crayon_1.4.1                  RCurl_1.98-1.3               
 [31] jsonlite_1.7.2                spatstat.data_2.0-0          
 [33] survival_3.2-10               zoo_1.8-9                    
 [35] glue_1.4.2                    polyclip_1.10-0              
 [37] gtable_0.3.0                  zlibbioc_1.36.0              
 [39] XVector_0.30.0                leiden_0.3.7                 
 [41] future.apply_1.7.0            abind_1.4-5                  
 [43] scales_1.1.1                  DBI_1.1.1                    
 [45] miniUI_0.1.1.1                Rcpp_1.0.6                   
 [47] viridisLite_0.3.0             xtable_1.8-4                 
 [49] reticulate_1.18               spatstat.core_1.65-5         
 [51] bit_4.0.4                     htmlwidgets_1.5.3            
 [53] httr_1.4.2                    RColorBrewer_1.1-2           
 [55] ellipsis_0.3.1                ica_1.0-2                    
 [57] pkgconfig_2.0.3               farver_2.1.0                 
 [59] ggseqlogo_0.1                 uwot_0.1.10                  
 [61] deldir_0.2-10                 utf8_1.2.1                   
 [63] AnnotationDbi_1.52.0          tidyselect_1.1.0             
 [65] rlang_0.4.10                  reshape2_1.4.4               
 [67] later_1.1.0.1                 BiocVersion_3.12.0           
 [69] munsell_0.5.0                 tools_4.0.3                  
 [71] cachem_1.0.4                  generics_0.1.0               
 [73] RSQLite_2.2.4                 ggridges_0.5.3               
 [75] stringr_1.4.0                 fastmap_1.1.0                
 [77] yaml_2.2.1                    goftest_1.2-2                
 [79] bit64_4.0.5                   fitdistrplus_1.1-3           
 [81] purrr_0.3.4                   RANN_2.6.1                   
 [83] pbapply_1.4-3                 future_1.21.0                
 [85] nlme_3.1-152                  mime_0.10                    
 [87] slam_0.1-48                   RcppRoll_0.3.0               
 [89] compiler_4.0.3                rstudioapi_0.13              
 [91] interactiveDisplayBase_1.28.0 curl_4.3                     
 [93] plotly_4.9.3                  png_0.1-7                    
 [95] spatstat.utils_2.1-0          tibble_3.1.0                 
 [97] tweenr_1.0.1                  stringi_1.5.3                
 [99] lattice_0.20-41               Matrix_1.3-2                 
[101] vctrs_0.3.6                   pillar_1.5.1                 
[103] lifecycle_1.0.0               BiocManager_1.30.10          
[105] spatstat.geom_1.65-5          lmtest_0.9-38                
[107] RcppAnnoy_0.0.18              data.table_1.14.0            
[109] cowplot_1.1.1                 bitops_1.0-6                 
[111] irlba_2.3.3                   httpuv_1.5.5                 
[113] GenomicRanges_1.42.0          R6_2.5.0                     
[115] promises_1.2.0.1              KernSmooth_2.23-18           
[117] gridExtra_2.3                 lsa_0.73.2                   
[119] parallelly_1.24.0             codetools_0.2-18             
[121] MASS_7.3-53.1                 assertthat_0.2.1             
[123] withr_2.4.1                   qlcMatrix_0.9.7              
[125] sctransform_0.3.2             Rsamtools_2.6.0              
[127] GenomeInfoDbData_1.2.4        mgcv_1.8-34                  
[129] grid_4.0.3                    rpart_4.1-15                 
[131] tidyr_1.1.3                   Rtsne_0.15                   
[133] ggforce_0.3.3                 Biobase_2.50.0               
[135] shiny_1.6.0                  
# I have two scATAC datasets, I did QC, scATAC datasets integration, scATAC integration with scRNA following pipeline here:
#https://satijalab.org/signac/articles/pbmc_vignette.html

You can quickly search "I HAVE AN ERROR WITH THE CODE BELOW" to get to the error part in the code

#6800
counts.6800 <- Read10X_h5(filename = "PATH/PBMC.6800.cellranger.count.result.outs/filtered_peak_bc_matrix.h5")
# dim(counts.6800)
# [1] 102154   1760

metadata.6800 <- read.csv(
  file = "PATH/PBMC.6800.cellranger.count.result.outs/singlecell.csv",
  header = TRUE,
  row.names = 1
)

chrom_assay.6800 <- CreateChromatinAssay(
  counts = counts.6800,
  sep = c(":", "-"),
  fragments = 'PATH/PBMC.6800.cellranger.count.result.outs/fragments.tsv.gz',
  min.cells = 10,
  min.features = 200
)
pbmc.6800 <- CreateSeuratObject(
  counts = chrom_assay.6800,
  assay = "peaks",
  meta.data = metadata.6800
)

Annotation(pbmc.6800) <- annotations

pbmc.6800 <- NucleosomeSignal(object = pbmc.6800)

# compute TSS enrichment score per cell
pbmc.6800 <- TSSEnrichment(object = pbmc.6800, fast = FALSE)

# add blacklist ratio and fraction of reads in peaks
pbmc.6800$pct_reads_in_peaks <- pbmc.6800$peak_region_fragments / pbmc.6800$passed_filters * 100
pbmc.6800$blacklist_ratio <- pbmc.6800$blacklist_region_fragments / pbmc.6800$peak_region_fragments

pbmc.6800$high.tss <- ifelse(pbmc.6800$TSS.enrichment > 2, 'High', 'Low')
#6800 pbmc$TSS.enrichment > 2 most of them are true

TSSPlot(pbmc.6800, group.by = 'high.tss') + NoLegend()

pbmc.6800$nucleosome_group <- ifelse(pbmc.6800$nucleosome_signal > 4, 'NS > 4', 'NS < 4')
#
FragmentHistogram(object = pbmc.6800, group.by = 'nucleosome_group')

pbmc.6800 <- subset(
  x = pbmc.6800,
  subset = peak_region_fragments > 3000 &
    peak_region_fragments < 20000 &
    pct_reads_in_peaks > 15 &
    blacklist_ratio < 0.05 &
    nucleosome_signal < 4 &
    TSS.enrichment > 2
)

pbmc.6800 <- RunTFIDF(pbmc.6800)
pbmc.6800 <- FindTopFeatures(pbmc.6800, min.cutoff = 'q0')
pbmc.6800 <- RunSVD(pbmc.6800)

fragpath.6798 <- "PATH/PBMC.6798.cellranger.count.result.outs/fragments.tsv.gz"
fragcounts.6798 <- CountFragments(fragments = fragpath.6798)

atac.cells.6798 <- fragcounts.6798[fragcounts.6798$frequency_count > 2000, "CB"]

# create the fragment object
atac.frags.6798 <- CreateFragmentObject(path = fragpath.6798, cells = atac.cells.6798)

# quantify 6800 peaks in the 6798 scATAC-seq dataset

counts <- FeatureMatrix(
  fragments = atac.frags.6798,
  features = granges(pbmc.6800),  
  cells = atac.cells.6798
)

# create new object for 6798
atac.assay <- CreateChromatinAssay(
  counts = counts,
  min.features = 1000, #Include cells where at least this many features are detected.
  fragments = atac.frags.6798
)
pbmc.atac.6798 <- CreateSeuratObject(counts = atac.assay, assay = "peaks")
pbmc.atac.6798 <- subset(pbmc.atac.6798, nCount_peaks > 2000 & nCount_peaks < 30000)

# compute LSI
pbmc.atac.6798 <- FindTopFeatures(pbmc.atac.6798, min.cutoff = 10)
pbmc.atac.6798 <- RunTFIDF(pbmc.atac.6798)
pbmc.atac.6798 <- RunSVD(pbmc.atac.6798)

pbmc.atac.6798$dataset <- "6798"
pbmc.6800$dataset <- "6800"

integration.anchors <- FindIntegrationAnchors(
  object.list = list(pbmc.atac.6798, pbmc.6800),
  anchor.features = rownames(pbmc.6800),
  reduction = "rlsi",
  dims = 2:30
)
integrated <- IntegrateEmbeddings(
  anchorset = integration.anchors,
  reductions = pbmc.combined[["lsi"]],
  new.reduction.name = "integrated_lsi",
  dims.to.integrate = 1:30
)
integrated <- RunUMAP(integrated, reduction = "integrated_lsi", dims = 2:30)

###clusters
#Normalization and linear dimensional reduction
integrated <- RunTFIDF(integrated)
integrated <- FindTopFeatures(integrated, min.cutoff = 'q0')
integrated <- RunSVD(integrated)
DepthCor(integrated)

integrated <- RunUMAP(object = integrated, reduction = 'lsi', dims = 2:30)
integrated <- FindNeighbors(object = integrated, reduction = 'lsi', dims = 2:30)

integrated <- FindClusters(object = integrated, verbose = FALSE, algorithm = 3)

gene.activities <- GeneActivity(integrated, extend.upstream = 2000, extend.downstream = 2000)

# add the gene activity matrix to the Seurat object as a new assay and normalize it
integrated[['RNA']] <- CreateAssayObject(counts = gene.activities)
integrated <- NormalizeData(
  object = integrated,
  assay = 'RNA',
  normalization.method = 'LogNormalize',
  scale.factor = median(integrated$nCount_RNA)
)
DefaultAssay(integrated) <- 'RNA'

# Load the pre-processed scRNA-seq data for PBMCs
pbmc_sc_rna <- readRDS("/work/abg/pyang19/opt/scRNA/scRNA.rds")

 > pbmc_sc_rna
An object of class Seurat
30796 features across 28810 samples within 3 assays
Active assay: RNA (16140 features, 0 variable features)
2 other assays present: SCT, integrated
3 dimensional reductions calculated: pca, umap, tsne

 integrated[["RNA"]]  #+-2000
Assay data with 20939 features for 2323 cells
First 10 features:
  TBP, PSMB1, FAM120B, DLL1, ERMARD, TCTE3, PHF10, C6orf120, .3, THBS2 

DefaultAssay(pbmc_sc_rna) <- 'integrated'

transfer.anchors.integration.RNA <- FindTransferAnchors(
  reference = pbmc_sc_rna,
  query = integrated,
  reduction = 'cca'
)

predicted.labels <- TransferData(
  anchorset = transfer.anchors.integration.RNA,
  refdata = pbmc_sc_rna$celltypes,
  weight.reduction = integrated[['lsi']],
  dims = 2:30
)

integrated <- AddMetaData(object = integrated, metadata = predicted.labels)

#annotate our scATAC-seq-derived clusters
#pbmc <- subset(integrated, idents = 1, invert = TRUE)
integrated <- RenameIdents(
  object = integrated,
  '0' = 'B cells',
  '1' = 'Monocytes',
  '2' = 'Monocytes',
  '3' = 'CD2- GD T cells',
  '4' = 'CD4+ ab T cells',
  '5' = 'CD4+ ab T cells',
  '6' = 'B cells',
  '7' = 'NK cells',
  '8' = 'CD8ab+ ab T cells',
  '9' = 'B cells',
  '10' = 'cDCs',
  '11' = 'pDCs'
)

#Find differentially accessible peaks between clusters

# change back to working with peaks instead of gene activities
DefaultAssay(integrated) <- 'peaks'

da_peaks <- FindMarkers(
  object = integrated,
  ident.1 = "Monocytes",
  ident.2 = "B cells",
  min.pct = 0.2,
  test.use = 'LR',
  latent.vars = 'peak_region_fragments'
)

#I HAVE AN ERROR WITH THE CODE BELOW:
CoveragePlot(
  object = integrated,
  region = da_peaks$region[167], #CD79A
  extend.upstream = 40000,
  extend.downstream = 20000
)

Error in annotation[annotation$type == "body", ] : 
  incorrect number of dimensions

da_peaks$region[167]
> da_peaks$region[167]
[1] "49998118-50000003"

head(Annotation(integrated))

# GRanges object with 6 ranges and 8 metadata columns:
#   GRanges object with 6 ranges and 8 metadata columns:
#   seqnames        ranges strand |            gene_id
# <Rle>     <IRanges>  <Rle> |        <character>
#   ENSSSCG00000037372        1    3472-18696      - | ENSSSCG00000037372
# ENSSSCG00000027257        1   23368-40113      + | ENSSSCG00000027257
# ENSSSCG00000029697        1  96218-186785      - | ENSSSCG00000029697
# ENSSSCG00000027274        1 112763-113499      - | ENSSSCG00000027274
# ENSSSCG00000027726        1 198992-211342      + | ENSSSCG00000027726
# ENSSSCG00000033475        1 352196-356072      + | ENSSSCG00000033475
# gene_name   gene_biotype seq_coord_system
# <character>    <character>      <character>
#   ENSSSCG00000037372         TBP protein_coding       chromosome
# ENSSSCG00000027257       PSMB1 protein_coding       chromosome
# ENSSSCG00000029697     FAM120B protein_coding       chromosome
# ENSSSCG00000027274             protein_coding       chromosome
# ENSSSCG00000027726        DLL1 protein_coding       chromosome
# ENSSSCG00000033475                    lincRNA       chromosome
# description      gene_id_version      symbol
# <character>          <character> <character>
#   ENSSSCG00000037372 TATA-box binding pro.. ENSSSCG00000037372.1         TBP
# ENSSSCG00000027257 proteasome subunit b.. ENSSSCG00000027257.2       PSMB1
# ENSSSCG00000029697 family with sequence.. ENSSSCG00000029697.2     FAM120B
# ENSSSCG00000027274                   NULL ENSSSCG00000027274.2            
# ENSSSCG00000027726 delta like canonical.. ENSSSCG00000027726.2        DLL1
# ENSSSCG00000033475                   NULL ENSSSCG00000033475.1            
# entrezid
# <list>
#   ENSSSCG00000037372 110259740
# ENSSSCG00000027257 100621969
# ENSSSCG00000029697 100620583
# ENSSSCG00000027274      <NA>
#   ENSSSCG00000027726 100620481
# ENSSSCG00000033475      <NA>
#   -------
#   seqinfo: 352 sequences 

CoveragePlot(
  object = integrated,
  region = da_peaks$region[167], #CD79A
  extend.upstream = 40000,
  extend.downstream = 20000
  annotation = FALSE
)
#if I set up the annotation as FALSE, the plot would look like below:

Screen Shot 2021-06-07 at 12 32 51 PM As you can see from picture above, I do not have the gene at the bottom of plot. But I do want it. Does this error result from that there is no information on whether the region on each row correspond to an exon or intron? For example, in the "seq_coord_system" of the head(Annotation(integrated)). it only says "chromosome" rather saying "intron" or "exon"

Thanks

timoast commented 3 years ago

How did you generate the gene annotation granges object?

pengxin2019 commented 3 years ago

What exactly type of information from the annotation are used to plot the genes?

pengxin2019 commented 3 years ago

Hi Tim, just a follow up. Did you have an opportunity to check with this bug?

Thanks Best

timoast commented 3 years ago

The annotation information must contain a "type" column (cds, exon, gap, utr). I would suggest creating the annotations as shown in the Signac vignettes, and then re-creating the same format (with all the same columns) with your custom genome annotation

zrcjessica commented 3 years ago

Hi, I am using the Ensembl annotations and have made sure that I have a type column. However, I've noticed that my coverage plots don't show the gene name, even though there is a gene_name column in my genome annotation. Any thoughts on why this is? Thanks!

This is what my annotation looks like:

GRanges object with 6 ranges and 20 metadata columns:
      seqnames        ranges strand |         source       type     score
         <Rle>     <IRanges>  <Rle> |       <factor>   <factor> <numeric>
  [1]        1 396700-409750      + | ensembl_havana gene              NA
  [2]        1 396700-409676      + | ensembl        transcript        NA
  [3]        1 396700-396905      + | ensembl        exon              NA
  [4]        1 397780-397788      + | ensembl        exon              NA
  [5]        1 399062-399070      + | ensembl        exon              NA
  [6]        1 399557-399827      + | ensembl        exon              NA
          phase            gene_id gene_version      gene_name    gene_source
      <integer>        <character>  <character>    <character>    <character>
  [1]      <NA> ENSRNOG00000046319            4 AABR07000046.1 ensembl_havana
  [2]      <NA> ENSRNOG00000046319            4 AABR07000046.1 ensembl_havana
  [3]      <NA> ENSRNOG00000046319            4 AABR07000046.1 ensembl_havana
  [4]      <NA> ENSRNOG00000046319            4 AABR07000046.1 ensembl_havana
  [5]      <NA> ENSRNOG00000046319            4 AABR07000046.1 ensembl_havana
  [6]      <NA> ENSRNOG00000046319            4 AABR07000046.1 ensembl_havana
              gene_biotype      transcript_id transcript_version
               <character>        <character>        <character>
  [1] processed_transcript               <NA>               <NA>
  [2] processed_transcript ENSRNOT00000044187                  4
  [3] processed_transcript ENSRNOT00000044187                  4
  [4] processed_transcript ENSRNOT00000044187                  4
  [5] processed_transcript ENSRNOT00000044187                  4
  [6] processed_transcript ENSRNOT00000044187                  4
         transcript_name transcript_source   transcript_biotype exon_number
             <character>       <character>          <character> <character>
  [1]               <NA>              <NA>                 <NA>        <NA>
  [2] AABR07000046.1-202           ensembl processed_transcript        <NA>
  [3] AABR07000046.1-202           ensembl processed_transcript           1
  [4] AABR07000046.1-202           ensembl processed_transcript           2
  [5] AABR07000046.1-202           ensembl processed_transcript           3
  [6] AABR07000046.1-202           ensembl processed_transcript           4
                 exon_id exon_version  protein_id protein_version         tag
             <character>  <character> <character>     <character> <character>
  [1]               <NA>         <NA>        <NA>            <NA>        <NA>
  [2]               <NA>         <NA>        <NA>            <NA>        <NA>
  [3] ENSRNOE00000493937            1        <NA>            <NA>        <NA>
  [4] ENSRNOE00000544646            1        <NA>            <NA>        <NA>
  [5] ENSRNOE00000574883            1        <NA>            <NA>        <NA>
  [6] ENSRNOE00000481734            2        <NA>            <NA>        <NA>
  -------
  seqinfo: 162 sequences from Rnor_6.0 genome; no seqlengths

UPDATE: I noticed that it does sometimes show the gene name, but not always. I would guess that the latter happens when the entire gene isn't in the plotting frame. Is there any way to make this show the gene name for genes that are even partially located in the plotting frame? Thanks!

pengxin2019 commented 3 years ago

Hi Jessica, I hope this finds you well. Would you mind if I ask whether you can give me some hint on how to get the type column of the annotation? How did you get the annotation object? I got it from AnnotationHub and my annotation looks like this:

head(Annotation(integrated))

# GRanges object with 6 ranges and 8 metadata columns:
#   GRanges object with 6 ranges and 8 metadata columns:
#   seqnames        ranges strand |            gene_id
# <Rle>     <IRanges>  <Rle> |        <character>
#   ENSSSCG00000037372        1    3472-18696      - | ENSSSCG00000037372
# ENSSSCG00000027257        1   23368-40113      + | ENSSSCG00000027257
# ENSSSCG00000029697        1  96218-186785      - | ENSSSCG00000029697
# ENSSSCG00000027274        1 112763-113499      - | ENSSSCG00000027274
# ENSSSCG00000027726        1 198992-211342      + | ENSSSCG00000027726
# ENSSSCG00000033475        1 352196-356072      + | ENSSSCG00000033475
# gene_name   gene_biotype seq_coord_system
# <character>    <character>      <character>
#   ENSSSCG00000037372         TBP protein_coding       chromosome
# ENSSSCG00000027257       PSMB1 protein_coding       chromosome
# ENSSSCG00000029697     FAM120B protein_coding       chromosome
# ENSSSCG00000027274             protein_coding       chromosome
# ENSSSCG00000027726        DLL1 protein_coding       chromosome
# ENSSSCG00000033475                    lincRNA       chromosome
# description      gene_id_version      symbol
# <character>          <character> <character>
#   ENSSSCG00000037372 TATA-box binding pro.. ENSSSCG00000037372.1         TBP
# ENSSSCG00000027257 proteasome subunit b.. ENSSSCG00000027257.2       PSMB1
# ENSSSCG00000029697 family with sequence.. ENSSSCG00000029697.2     FAM120B
# ENSSSCG00000027274                   NULL ENSSSCG00000027274.2            
# ENSSSCG00000027726 delta like canonical.. ENSSSCG00000027726.2        DLL1
# ENSSSCG00000033475                   NULL ENSSSCG00000033475.1            
# entrezid
# <list>
#   ENSSSCG00000037372 110259740
# ENSSSCG00000027257 100621969
# ENSSSCG00000029697 100620583
# ENSSSCG00000027274      <NA>
#   ENSSSCG00000027726 100620481
# ENSSSCG00000033475      <NA>
#   -------
#   seqinfo: 352 sequences 

Thanks Best penny

zrcjessica commented 3 years ago

Hi Penny,

I ended up downloading the GTF file from Ensembl corresponding to the assembly that I aligned my reads to and that GTF file had a gene type column. I loaded the file into R as a GRanges object (you can use the import function from the rtracklayer package to directly load in the GTF file as a GRanges object or you can read it in as a data frame and convert to a GRanges object yourself). I then set that GRanges object as the annotation for my Signac object. For example, something like:

Annotation(atac) <- gtf.gr

Hope that helps!

pengxin2019 commented 3 years ago

Hi Jessica, This is sweet! Thanks for your suggestions!! Good luck with your analysis.

Best Penny

14zac2 commented 3 years ago

Hi - I'm having the same issue where I am getting the error

Error in annotation[annotation$type == "body", ] : 
  incorrect number of dimensions

when using CoveragePlot() with annotation = TRUE and I can't figure out why. I am reading in a gtf file using rtracklayer and there is a type column. Has anyone figured out why this is happening?

MoritzTh commented 2 years ago

@14zac2 have you found a solution to your problem? I am having the exact same issue.

14zac2 commented 2 years ago

@MoritzTh I did! My GTF file was missing the column gene_biotype so I simply modified my GTF file to have that column, just putting in "protein_coding" for every row. Not sure if the value in the column makes any difference, but not having it there was causing the bug:

gtf$gene_biotype <- "protein_coding"

MoritzTh commented 2 years ago

Thanks for answering! The gene_biotypecolumn is present in my GTF file, so not sure what the issue might be here.

14zac2 commented 2 years ago

@MoritzTh no problem! I found that the missing column was an issue with mine by going through the code of the graphing function line by line (you can look at the function by going View(CoveragePlot) and noticing that the column was used in the function. Maybe this process would help you, as well! I would just try to make sure that all of your custom files are formatted exactly as expected.

SidG13 commented 2 years ago

I'm having the exact same issues as well. I've implemented the fix putting "protein_coding" in the gene_biotype field, and also making sure a regular type field is present and has CDS, exon, gene, transcript etc.

The output of my granges annotation object is below. Am I missing anything? Any ideas what could be causing these errors?

> head(gene.coords)
GRanges object with 6 ranges and 22 metadata columns:
      seqnames        ranges strand |   source     type     score     phase                gene_id
         <Rle>     <IRanges>  <Rle> | <factor> <factor> <numeric> <integer>            <character>
  [1]     myo1        1-2076      + |    NA        gene        NA      <NA>              myotoxin1
  [2]        Z     5240-9278      - |    maker     gene        NA      <NA> maker-Z-augustus-gen..
  [3]        Z   43556-53413      + |    maker     gene        NA      <NA> maker-Z-augustus-gen..
  [4]        Z   74320-75666      + |    maker     gene        NA      <NA> augustus_masked-Z-pr..
  [5]        Z  98879-137683      + |    maker     gene        NA      <NA> augustus_masked-Z-pr..
  [6]        Z 139942-146170      + |    maker     gene        NA      <NA> maker-Z-augustus-gen..
         transcript_id              gene_name      Parent   gene_biotype Anolis_Blast_Type
           <character>            <character> <character>    <character>       <character>
  [1] myotoxin_model_1              myotoxin1        <NA> protein_coding              <NA>
  [2]             <NA> maker-Z-augustus-gen..        <NA> protein_coding            ONEWAY
  [3]             <NA> maker-Z-augustus-gen..        <NA> protein_coding               RBB
  [4]             <NA> augustus_masked-Z-pr..        <NA> protein_coding               RBB
  [5]             <NA> augustus_masked-Z-pr..        <NA> protein_coding               RBB
  [6]             <NA> maker-Z-augustus-gen..        <NA> protein_coding            ONEWAY
              Anolis_Homolog   Crovir_Transcript_ID                   Name Python_Blast_Type
                 <character>            <character>            <character>       <character>
  [1]                   <NA>                   <NA>                   <NA>              <NA>
  [2] XP_016852035.1_10055.. crovir-transcript-1688 maker-Z-augustus-gen..            ONEWAY
  [3] XP_003225965.3_10055.. crovir-transcript-1686 maker-Z-augustus-gen..               RBB
  [4] XP_008116758.2_10328.. crovir-transcript-1684 augustus_masked-Z-pr..               RBB
  [5] XP_008121683.1_10055.. crovir-transcript-1685 augustus_masked-Z-pr..               RBB
  [6] XP_008121683.1_10055.. crovir-transcript-1687 maker-Z-augustus-gen..            ONEWAY
              Python_Homolog Thamnophis_Blast_Type     Thamnophis_Homolog       X_AED        X_QI
                 <character>           <character>            <character> <character> <character>
  [1]                   <NA>                  <NA>                   <NA>        <NA>        <NA>
  [2] XP_015744721.1_10306..                   RBB XP_013928481.1_10655..        <NA>        <NA>
  [3] XP_007435060.1_10306..                   RBB XP_013923894.1_10655..        <NA>        <NA>
  [4] XP_007443267.1_10305..                ONEWAY XP_013927888.1_10655..        <NA>        <NA>
  [5] XP_007434210.1_10305..                   RBB XP_013913927.1_10654..        <NA>        <NA>
  [6] XP_015744560.1_10305..                ONEWAY XP_013913930.1_10654..        <NA>        <NA>
           X_eAED Crovir_Protein_ID previous_transcript_id
      <character>       <character>            <character>
  [1]        <NA>              <NA>                   <NA>
  [2]        <NA>              <NA>                   <NA>
  [3]        <NA>              <NA>                   <NA>
  [4]        <NA>              <NA>                   <NA>
  [5]        <NA>              <NA>                   <NA>
  [6]        <NA>              <NA>                   <NA>
  -------
  seqinfo: 21 sequences from an unspecified genome; no seqlengths
kbattenb commented 2 years ago

So am I. SidG13 could not have shown it better. I am stuck at the same spot. How should the GRanges object be modified to have CoveragePlot show gene models?

amdamatac commented 2 years ago

@SidG13 Hi! I was having the same problem. I think CoveragePlot needs the "tx_id" column. It finally worked when I changed the "gene_id" to "tx_id" in the annotation from ensembl. Hope it helps!

savytskanatalia commented 11 months ago

I know, it's a very belated reply, but maybe it will help somebody else. For me the run failed if BOTH type and biotype info were present, as the script subsetting the object for features ends up subsetting extra type column and dropping the last needed other column instead, so the table used for plotting the gene or feature becomes invalid.

I'm having the exact same issues as well. I've implemented the fix putting "protein_coding" in the gene_biotype field, and also making sure a regular type field is present and has CDS, exon, gene, transcript etc.

The output of my granges annotation object is below. Am I missing anything? Any ideas what could be causing these errors?

> head(gene.coords)
GRanges object with 6 ranges and 22 metadata columns:
      seqnames        ranges strand |   source     type     score     phase                gene_id
         <Rle>     <IRanges>  <Rle> | <factor> <factor> <numeric> <integer>            <character>
  [1]     myo1        1-2076      + |    NA        gene        NA      <NA>              myotoxin1
  [2]        Z     5240-9278      - |    maker     gene        NA      <NA> maker-Z-augustus-gen..
  [3]        Z   43556-53413      + |    maker     gene        NA      <NA> maker-Z-augustus-gen..
  [4]        Z   74320-75666      + |    maker     gene        NA      <NA> augustus_masked-Z-pr..
  [5]        Z  98879-137683      + |    maker     gene        NA      <NA> augustus_masked-Z-pr..
  [6]        Z 139942-146170      + |    maker     gene        NA      <NA> maker-Z-augustus-gen..
         transcript_id              gene_name      Parent   gene_biotype Anolis_Blast_Type
           <character>            <character> <character>    <character>       <character>
  [1] myotoxin_model_1              myotoxin1        <NA> protein_coding              <NA>
  [2]             <NA> maker-Z-augustus-gen..        <NA> protein_coding            ONEWAY
  [3]             <NA> maker-Z-augustus-gen..        <NA> protein_coding               RBB
  [4]             <NA> augustus_masked-Z-pr..        <NA> protein_coding               RBB
  [5]             <NA> augustus_masked-Z-pr..        <NA> protein_coding               RBB
  [6]             <NA> maker-Z-augustus-gen..        <NA> protein_coding            ONEWAY
              Anolis_Homolog   Crovir_Transcript_ID                   Name Python_Blast_Type
                 <character>            <character>            <character>       <character>
  [1]                   <NA>                   <NA>                   <NA>              <NA>
  [2] XP_016852035.1_10055.. crovir-transcript-1688 maker-Z-augustus-gen..            ONEWAY
  [3] XP_003225965.3_10055.. crovir-transcript-1686 maker-Z-augustus-gen..               RBB
  [4] XP_008116758.2_10328.. crovir-transcript-1684 augustus_masked-Z-pr..               RBB
  [5] XP_008121683.1_10055.. crovir-transcript-1685 augustus_masked-Z-pr..               RBB
  [6] XP_008121683.1_10055.. crovir-transcript-1687 maker-Z-augustus-gen..            ONEWAY
              Python_Homolog Thamnophis_Blast_Type     Thamnophis_Homolog       X_AED        X_QI
                 <character>           <character>            <character> <character> <character>
  [1]                   <NA>                  <NA>                   <NA>        <NA>        <NA>
  [2] XP_015744721.1_10306..                   RBB XP_013928481.1_10655..        <NA>        <NA>
  [3] XP_007435060.1_10306..                   RBB XP_013923894.1_10655..        <NA>        <NA>
  [4] XP_007443267.1_10305..                ONEWAY XP_013927888.1_10655..        <NA>        <NA>
  [5] XP_007434210.1_10305..                   RBB XP_013913927.1_10654..        <NA>        <NA>
  [6] XP_015744560.1_10305..                ONEWAY XP_013913930.1_10654..        <NA>        <NA>
           X_eAED Crovir_Protein_ID previous_transcript_id
      <character>       <character>            <character>
  [1]        <NA>              <NA>                   <NA>
  [2]        <NA>              <NA>                   <NA>
  [3]        <NA>              <NA>                   <NA>
  [4]        <NA>              <NA>                   <NA>
  [5]        <NA>              <NA>                   <NA>
  [6]        <NA>              <NA>                   <NA>
  -------
  seqinfo: 21 sequences from an unspecified genome; no seqlengths