Closed mliiv closed 4 years ago
Hi Maria, thanks for your questions, I will try to answer them below:
Integration I think of as finding peaks that intersect at least x bp, taking the coordinates of these peaks, and introducing a new peak coordinate based on the intersecting peaks (after e.g. centering)
I think this is a description of one way dataset merging can work for scATAC-seq data. This is similar to what MergeWithRegions
does, only that there is no centering of peaks in that function (for convenience we keep the peak coordinates of one of the input datasets). The integration workflow is slightly different in that it will adjust the data to remove dataset-specific differences in chromatin accessibility. However the integration needs common features across datasets, which is why we have the MergeWithRegions
step in the integration vignette on the Signac website. You can read about how the integration algorithm works here: https://doi.org/10.1016/j.cell.2019.05.031
MergeWithRegions - merge datasets in a coordinate-aware manner. Does this mean that no shifts in peak coordinates were introduced (meaning no integration)?
Answered above
When running FindTopFeatures (min.cutoff = "q75") on this "unintegrated" object, I do get some 24K peaks - can I expect these peaks to contribute to the technical variance then (since UMAP generates 2 different clusters)?
This will return peaks that were accessible in more than 75% of the cells, I suppose if you expect any differences between the two groups to be purely technical, and there is differential accessibility between the groups for these peaks (which there must be if they're separating on the UMAP), then they would contribute to the technical variance.
So I continued with GetIntersectingFeatures which gives me some 97K peaks out of a total of 267K across 2 samples. Peaks are with original coordinates. Here I have a uniform peak set that I could also use for my objective 1) - see how well they agree with DNase footprints. Correct?
Yes, you could compare the intersecting peaks across objects with DHS peaks. You might also try taking the union of peaks, rather than the intersection.
I can now continue with subsampling some peaks and running FeatureMatrix, and add a new assay with these subsampled peaks to one of my samples (seurat objects) "# create a new assay and add it to the 10x dataset". Will that seurat object subset be used by FindIntegrationAnchors or IntegrateData, in any other way than the peaks.use subset directly that I specify for the FindIntegrationAnchors? After making the integrated object, I have there 2 samples with subsampled as well as all peaks across both samples.
The subset of peaks will only be used for the integration. After integration, you will have the integrated assay, which will only contain the subset of peaks that were corrected, and the original data for both datasets as a merged "peaks" assay. Be aware that any peaks that were not shared across the datasets will be given zero counts in the dataset where it was not detected, but this does not necessarily mean that there were no fragments in that region of the genome for those cells. For this reason, it might be a good idea to take the union of peaks across the datasets and quantify the new peak set in both datasets using FeatureMatrix
.
Can I now use FindTopFeatures to compare the top peaks between the two samples in the integrated object (objective 2)? If so, is this correct: integrated <- RunTFIDF(integrated, assay = 'peaks') integrated <- FindTopFeatures(integrated, assay = 'peaks', min.cutoff = "q75")
A better way to compare peaks across datasets would be to perform a differential accessibility test, as this can take the total number of reads sequenced in each cell into account as a latent variable. For example, you can use the LR test: FindMarkers(object, ident.1 = dataset1, ident.2 = dataset2, test.use = 'LR', latent.vars = 'nCount_peaks')
. However the approach you suggest of comparing the percentage of cells with the peak accessible across datasets is not wrong, and will be faster, but the LR test could give slightly better results and doesn't depend on setting a cutoff (eg, 75% cells with the peak).
Alternatively, I guess I could use FeatureMatrix -> CreateGeneActivityMatrix for each sample separately using the same set of peaks and compare the peak-associated genes (quantify gene expression ‘activity’ from ATAC-seq reads)?
Yes you could also do this if you are particularly interested in gene accessibilty.
Thank you for the clarifications!
You are talking about the union of peaks - do you mean using bedtools or such? Since I haven't seen a function in signac for that yet.
However, I am still getting an error when running the integration. Can't seem to find any documentation about it yet:
anchors <- FindIntegrationAnchors( object.list = list(m27a.D3.atac, siC.D3.atac), anchor.features = peaks.use, assay = c('siC.D3.peaks', 'peaks'), k.filter = NA )
Scaling features for provided objects | | 0 % ~calculating Warning: All object keys must be alphanumeric characters, followed by an underscore (''), setting key to '' |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=01m 21s Finding all pairwise anchors | | 0 % ~calculating Error in DietSeurat(object = object.list[[i]], assays = assay[i], features = anchor.features, : The default assay is slated to be removed, please change the default assay
I have the original sample (m27a.D3.atac) with assay "peaks" and then I added another assay with all (not 10K sub-sampled) peaks, called siC.D3.peaks (from sample siC.D3.atac). I tried changing the DefaultAssay of m27a.D3.atac into either "peaks" or "siC.D3.peaks", but same error with both.
You can find the union of peaks using the union
function from the GenomicRanges
package. For example, you could do something like this to find the union of peaks between two objects, count reads in the peaks, and create a new assay:
library(Signac)
library(Seurat)
library(GenomicRanges)
peaks1 <- StringToGRanges(regions = rownames(obj1))
peaks2 <- StringToGRanges(regions = rownames(obj2))
all.peaks <- union(peaks1, peaks2)
newcounts1 <- FeatureMatrix(fragments = GetFragments(obj1), features = all.peaks)
newcounts2 <- FeatureMatrix(fragments = GetFragments(obj2), features = all.peaks)
obj1[['peakunion']] <- CreateAssayObject(counts = newcounts1)
obj2[['peakunion']] <- CreateAssayObject(counts = newcounts2)
What version of Seurat are you using? Can you post the output of sessionInfo()
?
Thanks, I totally forgot about that package! It's Seurat_3.1.1 : sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)
Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] loomR_0.2.1.9000 R6_2.4.0 data.table_1.12.6 Signac_0.1.5 hdf5r_1.3.0 ggplot2_3.2.1 Seurat_3.1.1
Are you able to share the Seurat objects or a downsampled version that reproduces the error?
Yes, I get same error when downsampling to 5000 genes, 500 cells in the beginning of the analysis. How to share?
You can email a dropbox / google drive / box link to tstuart@nygenome.org
It looks like the issue with the integration is that the peak names are different in your two assays, one uses ":" for the chromosome-start separator and the other uses "-", and so the features don't match. If you add sep = c(":", "-")
to your FeatureMatrix
call this should fix the problem
Hi,
Thank you for the help. That’s strange, when I make the objects, they are with same separators:
head(rownames(siC.D3.atac)) [1] "chr1:4748191-4748370" "chr1:4768389-4768799" "chr1:4769832-4770654" "chr1:4780043-4780522" "chr1:4784191-4786451" "chr1:4793275-4793662" head(rownames(m27a.D3.atac)) [1] "chr1:3989349-3989479" "chr1:4492165-4492454" "chr1:4591574-4591791" "chr1:4768401-4768790" "chr1:4769833-4770557" "chr1:4778274-4778377"
I am getting a lot of these warnings: Warning: All object keys must be alphanumeric characters, followed by an underscore (''), setting key to ''
Is there any modification I could do to my peak IDs to prevent these? But no more error with the FindIntegrationAnchors, very nice. Instead, now I get an error with IntegrateData:
integrated <- IntegrateData( anchorset = anchors, weight.reduction = m27a.D3.atac[['lsi']], preserve.order = TRUE, k.weight = 100 ) Warning: All object keys must be alphanumeric characters, followed by an underscore (''), setting key to '' Warning: All object keys must be alphanumeric characters, followed by an underscore (''), setting key to '' Merging dataset 2 into 1 Extracting anchors for merged samples Finding integration vectors Finding integration vector weights Error in Embeddings(reduction)[nn.cells2, dims] : subscript out of bounds
Same with k.weight = 20/50.
I ran FindIntegrationAnchors before with 10K peaks.use and k.filter = NA: anchors <- FindIntegrationAnchors( object.list = list(m27a.D3.atac, siC.D3.atac), anchor.features = peaks.use, assay = c('siC.D3.peaks', 'peaks'), k.filter = NA )
and the reduction was run (before adding the second assay) m27a.D3.atac <- RunSVD(m27a.D3.atac, n = 30, reduction.name = 'lsi', reduction.key = 'LSI_')
Thanks again, Maria
You can ignore those warnings, it's just something we need to update in Seurat to prevent them but it doesn't affect the results.
The reason you get different separators is that when you quantify the new set of peaks using FeatureMatrix
, by default this uses sep = c("-", "-")
as the coordinate separators, and so when you create a new assay in the m27a.D3.atac
using that count matrix it then has different feature names.
The problem above is with the weight.reduction
, you actually need to pass a reduction for both objects and they need to be in the same order as was used in FindIntegrationAnchors
, ie:
integrated <- IntegrateData(
anchorset = anchors,
weight.reduction = c(m27a.D3.atac[['lsi']], siC.D3.atac[['lsi']]),
preserve.order = TRUE,
k.weight = 100
)
Alternatively you can just pass the name of the dimension reduction to use in the object, for example this will also work:
integrated <- IntegrateData(
anchorset = anchors,
weight.reduction = "lsi",
preserve.order = TRUE,
k.weight = 100
)
We will try to add some more informative error checking to Seurat, because right now it's not at all obvious that this was the problem
Great, that fixed it. Thank you, this package is very necessary currently!
You can find the union of peaks using the
union
function from theGenomicRanges
package. For example, you could do something like this to find the union of peaks between two objects, count reads in the peaks, and create a new assay:library(Signac) library(Seurat) library(GenomicRanges) peaks1 <- StringToGRanges(regions = rownames(obj1)) peaks2 <- StringToGRanges(regions = rownames(obj2)) all.peaks <- union(peaks1, peaks2) newcounts1 <- FeatureMatrix(fragments = GetFragments(obj1), features = all.peaks) newcounts2 <- FeatureMatrix(fragments = GetFragments(obj2), features = all.peaks) obj1[['peakunion']] <- CreateAssayObject(counts = newcounts1) obj2[['peakunion']] <- CreateAssayObject(counts = newcounts2)
What version of Seurat are you using? Can you post the output of
sessionInfo()
?
Hi @timoast im basically running into issues with integration. So I am trying out all of your suggestions first before asking for your assistance/guidance. So running your above-quoted code I get an error:
> newcounts1 <- FeatureMatrix(fragments = GetFragments(obj1), features = all.peaks)
Error in GetFragments(obj1) : could not find function "GetFragments"
This is my session info:
> session_info()
- Session info ---------------------------------------------------------------------------------------------------------------------------
setting value
version R version 4.0.2 (2020-06-22)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/Chicago
date 2020-10-06
- Packages -------------------------------------------------------------------------------------------------------------------------------
package * version date lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.0.0)
AnnotationDbi * 1.50.3 2020-07-25 [1] Bioconductor
AnnotationFilter * 1.12.0 2020-04-27 [1] Bioconductor
askpass 1.1 2019-01-13 [1] CRAN (R 4.0.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
backports 1.1.10 2020-09-15 [1] CRAN (R 4.0.2)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.0.0)
Biobase * 2.48.0 2020-04-27 [1] Bioconductor
BiocFileCache 1.12.1 2020-08-04 [1] Bioconductor
BiocGenerics * 0.34.0 2020-04-27 [1] Bioconductor
BiocManager 1.30.10 2019-11-16 [1] CRAN (R 4.0.2)
BiocParallel 1.22.0 2020-04-27 [1] Bioconductor
biomaRt 2.44.1 2020-06-17 [1] Bioconductor
Biostrings 2.56.0 2020-04-27 [1] Bioconductor
biovizBase 1.36.0 2020-04-27 [1] Bioconductor
bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
bitops 1.0-6 2013-08-17 [1] CRAN (R 4.0.0)
blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.2)
BSgenome 1.56.0 2020-04-27 [1] Bioconductor
callr 3.4.4 2020-09-07 [1] CRAN (R 4.0.2)
checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.0.2)
cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.2)
cluster 2.1.0 2019-06-19 [1] CRAN (R 4.0.2)
clustree * 0.4.3 2020-06-14 [1] CRAN (R 4.0.2)
codetools 0.2-16 2018-12-24 [1] CRAN (R 4.0.2)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.2)
combinat 0.0-8 2012-10-29 [1] CRAN (R 4.0.0)
cowplot * 1.1.0 2020-09-08 [1] CRAN (R 4.0.2)
crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
curl 4.3 2019-12-02 [1] CRAN (R 4.0.2)
data.table 1.13.0 2020-07-24 [1] CRAN (R 4.0.2)
DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.2)
dbplyr 1.4.4 2020-05-27 [1] CRAN (R 4.0.2)
DDRTree * 0.1.5 2017-04-30 [1] CRAN (R 4.0.2)
DelayedArray 0.14.1 2020-07-15 [1] Bioconductor
deldir 0.1-29 2020-09-13 [1] CRAN (R 4.0.2)
densityClust 0.3 2017-10-24 [1] CRAN (R 4.0.2)
desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
devtools * 2.3.2 2020-09-18 [1] CRAN (R 4.0.2)
dichromat 2.0-0 2013-01-24 [1] CRAN (R 4.0.0)
digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.2)
docopt 0.7.1 2020-06-24 [1] CRAN (R 4.0.2)
dplyr * 1.0.2 2020-08-18 [1] CRAN (R 4.0.2)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
EnsDb.Hsapiens.v75 * 2.99.0 2020-10-06 [1] Bioconductor
EnsDb.Hsapiens.v86 * 2.99.0 2020-10-06 [1] Bioconductor
ensembldb * 2.12.1 2020-05-06 [1] Bioconductor
fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
farver 2.0.3 2020-01-16 [1] CRAN (R 4.0.2)
fastICA 1.2-2 2019-07-08 [1] CRAN (R 4.0.2)
fastmap 1.0.1 2019-10-08 [1] CRAN (R 4.0.2)
fastmatch 1.1-0 2017-01-28 [1] CRAN (R 4.0.0)
fitdistrplus 1.1-1 2020-05-19 [1] CRAN (R 4.0.2)
FNN 1.1.3 2019-02-15 [1] CRAN (R 4.0.2)
foreach 1.5.0 2020-03-30 [1] CRAN (R 4.0.2)
foreign 0.8-80 2020-05-24 [1] CRAN (R 4.0.2)
Formula 1.2-3 2018-05-03 [1] CRAN (R 4.0.0)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
future 1.19.1 2020-09-22 [1] CRAN (R 4.0.2)
future.apply 1.6.0 2020-07-01 [1] CRAN (R 4.0.2)
generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.2)
GenomeInfoDb * 1.24.2 2020-06-15 [1] Bioconductor
GenomeInfoDbData * 1.2.3 2020-10-06 [1] Bioconductor
GenomicAlignments 1.24.0 2020-04-27 [1] Bioconductor
GenomicFeatures * 1.40.1 2020-07-08 [1] Bioconductor
GenomicRanges * 1.40.0 2020-04-27 [1] Bioconductor
GGally 2.0.0 2020-06-06 [1] CRAN (R 4.0.2)
ggbio 1.36.0 2020-04-27 [1] Bioconductor
ggforce 0.3.2 2020-06-23 [1] CRAN (R 4.0.2)
ggplot2 * 3.3.2 2020-06-19 [1] CRAN (R 4.0.2)
ggraph * 2.0.3 2020-05-20 [1] CRAN (R 4.0.2)
ggrepel * 0.8.2 2020-03-08 [1] CRAN (R 4.0.2)
ggridges * 0.5.2 2020-01-12 [1] CRAN (R 4.0.2)
ggseqlogo 0.1 2017-07-25 [1] CRAN (R 4.0.2)
glmnet 4.0-2 2020-06-16 [1] CRAN (R 4.0.2)
globals 0.13.0 2020-09-17 [1] CRAN (R 4.0.2)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
goftest 1.2-2 2019-12-02 [1] CRAN (R 4.0.0)
graph 1.66.0 2020-04-27 [1] Bioconductor
graphlayouts 0.7.0 2020-04-25 [1] CRAN (R 4.0.2)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.2)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2)
harmony * 1.0 2020-10-06 [1] Github (immunogenomics/harmony@88b1e2a)
hdf5r * 1.3.3 2020-08-18 [1] CRAN (R 4.0.2)
Hmisc 4.4-1 2020-08-10 [1] CRAN (R 4.0.2)
hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.2)
HSMMSingleCell 1.8.0 2020-05-07 [1] Bioconductor
htmlTable 2.1.0 2020-09-16 [1] CRAN (R 4.0.2)
htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
htmlwidgets 1.5.2 2020-10-03 [1] CRAN (R 4.0.2)
httpuv 1.5.4 2020-06-06 [1] CRAN (R 4.0.2)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
ica 1.0-2 2018-05-24 [1] CRAN (R 4.0.0)
igraph 1.2.5 2020-03-19 [1] CRAN (R 4.0.2)
IRanges * 2.22.2 2020-05-21 [1] Bioconductor
irlba * 2.3.3 2019-02-05 [1] CRAN (R 4.0.2)
iterators 1.0.12 2019-07-26 [1] CRAN (R 4.0.2)
jpeg 0.1-8.1 2019-10-24 [1] CRAN (R 4.0.0)
jsonlite 1.7.1 2020-09-07 [1] CRAN (R 4.0.2)
KernSmooth 2.23-17 2020-04-26 [1] CRAN (R 4.0.2)
knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2)
labeling 0.3 2014-08-23 [1] CRAN (R 4.0.0)
later 1.1.0.1 2020-06-05 [1] CRAN (R 4.0.2)
lattice 0.20-41 2020-04-02 [1] CRAN (R 4.0.2)
latticeExtra 0.6-29 2019-12-19 [1] CRAN (R 4.0.2)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.2)
leiden 0.3.3 2020-02-04 [1] CRAN (R 4.0.2)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.2)
limma 3.44.3 2020-06-12 [1] Bioconductor
listenv 0.8.0 2019-12-05 [1] CRAN (R 4.0.2)
lmtest 0.9-38 2020-09-09 [1] CRAN (R 4.0.2)
lsa 0.73.2 2020-05-04 [1] CRAN (R 4.0.2)
magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
MASS 7.3-53 2020-09-09 [1] CRAN (R 4.0.2)
Matrix * 1.2-18 2019-11-27 [1] CRAN (R 4.0.2)
matrixStats 0.57.0 2020-09-25 [1] CRAN (R 4.0.2)
memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
mgcv 1.8-33 2020-08-27 [1] CRAN (R 4.0.2)
mime 0.9 2020-02-04 [1] CRAN (R 4.0.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.0.2)
monocle * 2.16.0 2020-04-27 [1] Bioconductor
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2)
nlme 3.1-149 2020-08-23 [1] CRAN (R 4.0.2)
nnet 7.3-14 2020-04-26 [1] CRAN (R 4.0.2)
nzilbb.labbcat * 0.6-1 2020-08-27 [1] CRAN (R 4.0.2)
openssl 1.4.3 2020-09-18 [1] CRAN (R 4.0.2)
OrganismDbi 1.30.0 2020-04-28 [1] Bioconductor
patchwork 1.0.1 2020-06-22 [1] CRAN (R 4.0.2)
pbapply 1.4-3 2020-08-18 [1] CRAN (R 4.0.2)
pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.0.2)
pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.2)
pkgbuild * 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
plotly * 4.9.2.1 2020-04-04 [1] CRAN (R 4.0.2)
plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.2)
png 0.1-7 2013-12-03 [1] CRAN (R 4.0.0)
polyclip 1.10-0 2019-03-14 [1] CRAN (R 4.0.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2)
progress 1.2.2 2019-05-16 [1] CRAN (R 4.0.2)
promises 1.1.1 2020-06-09 [1] CRAN (R 4.0.2)
ProtGenerics 1.20.0 2020-04-27 [1] Bioconductor
ps 1.3.4 2020-08-11 [1] CRAN (R 4.0.2)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
qlcMatrix 0.9.7 2018-04-20 [1] CRAN (R 4.0.2)
R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.2)
RANN 2.6.1 2019-01-08 [1] CRAN (R 4.0.2)
rappdirs 0.3.1 2016-03-28 [1] CRAN (R 4.0.2)
RBGL 1.64.0 2020-04-27 [1] Bioconductor
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.0.0)
Rcpp * 1.0.5 2020-07-06 [1] CRAN (R 4.0.2)
RcppAnnoy 0.0.16 2020-03-08 [1] CRAN (R 4.0.2)
RcppRoll 0.3.0 2018-06-05 [1] CRAN (R 4.0.2)
RCurl 1.98-1.2 2020-04-18 [1] CRAN (R 4.0.2)
remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
reshape 0.8.8 2018-10-23 [1] CRAN (R 4.0.2)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.0.2)
reticulate 1.16 2020-05-27 [1] CRAN (R 4.0.2)
rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
ROCR 1.0-11 2020-05-02 [1] CRAN (R 4.0.2)
rpart 4.1-15 2019-04-12 [1] CRAN (R 4.0.2)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
Rsamtools 2.4.0 2020-04-27 [1] Bioconductor
RSpectra 0.16-0 2019-12-01 [1] CRAN (R 4.0.2)
RSQLite * 2.2.1 2020-09-30 [1] CRAN (R 4.0.2)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.2)
rsvd 1.0.3 2020-02-17 [1] CRAN (R 4.0.2)
rtracklayer 1.48.0 2020-04-27 [1] Bioconductor
Rtsne 0.15 2018-11-10 [1] CRAN (R 4.0.2)
S4Vectors * 0.26.1 2020-05-16 [1] Bioconductor
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2)
sctransform 0.3 2020-09-20 [1] CRAN (R 4.0.2)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
Seurat * 3.2.2.9002 2020-10-06 [1] Github (satijalab/seurat@cbfd7f9)
shape 1.4.5 2020-09-13 [1] CRAN (R 4.0.2)
shiny 1.5.0 2020-06-23 [1] CRAN (R 4.0.2)
Signac * 1.0.9001 2020-10-06 [1] Github (timoast/signac@b922f93)
slam 0.1-47 2019-12-21 [1] CRAN (R 4.0.0)
SnowballC 0.7.0 2020-04-01 [1] CRAN (R 4.0.0)
sparsesvd 0.2 2019-07-15 [1] CRAN (R 4.0.2)
spatstat 1.64-1 2020-05-12 [1] CRAN (R 4.0.2)
spatstat.data 1.4-3 2020-01-26 [1] CRAN (R 4.0.2)
spatstat.utils 1.17-0 2020-02-07 [1] CRAN (R 4.0.2)
stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
SummarizedExperiment 1.18.2 2020-07-09 [1] Bioconductor
survival 3.2-7 2020-09-28 [1] CRAN (R 4.0.2)
tensor 1.5 2012-05-05 [1] CRAN (R 4.0.0)
testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.2)
tibble 3.0.3 2020-07-10 [1] CRAN (R 4.0.2)
tidygraph 1.2.0 2020-05-12 [1] CRAN (R 4.0.2)
tidyr 1.1.2 2020-08-27 [1] CRAN (R 4.0.2)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2)
tweenr 1.0.1 2018-12-14 [1] CRAN (R 4.0.2)
usethis * 1.6.3 2020-09-17 [1] CRAN (R 4.0.2)
uwot 0.1.8 2020-03-16 [1] CRAN (R 4.0.2)
VariantAnnotation 1.34.0 2020-04-27 [1] Bioconductor
vctrs 0.3.4 2020-08-29 [1] CRAN (R 4.0.2)
VGAM * 1.1-3 2020-04-28 [1] CRAN (R 4.0.2)
viridis 0.5.1 2018-03-29 [1] CRAN (R 4.0.2)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 4.0.2)
withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2)
xfun 0.18 2020-09-29 [1] CRAN (R 4.0.2)
XML 3.99-0.5 2020-07-23 [1] CRAN (R 4.0.2)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.2)
XVector 0.28.0 2020-04-27 [1] Bioconductor
zlibbioc 1.34.0 2020-04-27 [1] Bioconductor
zoo 1.8-8 2020-05-02 [1] CRAN (R 4.0.2)
[1] C:/Program Files/R/R-4.0.2/library
@Dragonmasterx87 This is an old issue before version 1.0.0, the GetFragments
function has since been replaced by Fragments
Hello,
I would like to understand the concepts and functions for scATAC-seq integration vs merging in signac a bit better. My data is very low-dimensional, same cell type, couple of treatments, run on same 10X Chromium run on different lanes (one lane per treatment), sequenced separately and then run through cellranger v3 separately. I would like to 1) Verify that my top peaks (peaks represented in e.g. 50% cells) overlap some existing DNase footprints (little quality control),
2) Compare top peaks between treatment and control. For this I would need to first identify peak regions that were discovered across the treatment and control samples, so let's say I want to get peaks intersect between 2 samples for simplicity here. I have followed the tutorial "scATAC-seq data integration". Integration I think of as finding peaks that intersect at least x bp, taking the coordinates of these peaks, and introducing a new peak coordinate based on the intersecting peaks (after e.g. centering).
MergeWithRegions - merge datasets in a coordinate-aware manner. Does this mean that no shifts in peak coordinates were introduced (meaning no integration)? I first tried this but when looking at the UMAP the two samples clearly separated in 2 clusters, which I do not expect from the biology. I was a bit unexpected, though (since I should have very low tech variation). When running FindTopFeatures (min.cutoff = "q75") on this "unintegrated" object, I do get some 24K peaks - can I expect these peaks to contribute to the technical variance then (since UMAP generates 2 different clusters)?
So I continued with GetIntersectingFeatures which gives me some 97K peaks out of a total of 267K across 2 samples. Peaks are with original coordinates. Here I have a uniform peak set that I could also use for my objective 1) - see how well they agree with DNase footprints. Correct?
I can now continue with subsampling some peaks and running FeatureMatrix, and add a new assay with these subsampled peaks to one of my samples (seurat objects) "# create a new assay and add it to the 10x dataset". Will that seurat object subset be used by FindIntegrationAnchors or IntegrateData, in any other way than the peaks.use subset directly that I specify for the FindIntegrationAnchors? After making the integrated object, I have there 2 samples with subsampled as well as all peaks across both samples.
Can I now use FindTopFeatures to compare the top peaks between the two samples in the integrated object (objective 2)? If so, is this correct: integrated <- RunTFIDF(integrated, assay = 'peaks') integrated <- FindTopFeatures(integrated, assay = 'peaks', min.cutoff = "q75")
Since I am not sure it's comparing the peaks between my two samples (two sets of cells with different treatments) and rather across all the cells.
Alternatively, I guess I could use FeatureMatrix -> CreateGeneActivityMatrix for each sample separately using the same set of peaks and compare the peak-associated genes (quantify gene expression ‘activity’ from ATAC-seq reads)?
I hope you can understand my questions and thank you in advance, Maria