Open kerenzhou062 opened 3 months ago
I am also having this same issue. It appears to be an ArchR issue -- for some reason the tile.mtx@elementMetadata is NULL when it should return the tile coordinates.
Using the code in addTileMatrix()
and .addTileMat()
from MatrixTiles.R
in the ArchR source code, you can reconstruct the dataframe storing metadata for all genomic tiles as follows:
input <- archr.obj
ArrowFiles <- getArrowFiles(input)
chromSizes = getChromSizes(input)
blacklist = getBlacklist(input)
chromLengths <- end(chromSizes)
names(chromLengths) <- paste0(seqnames(chromSizes))
blacklist <- split(blacklist, seqnames(blacklist))
excludeChr = c("chrM", "chrY", "chrX")
# adjust according to your preferences
chromLengths <- chromLengths[names(chromLengths) %ni% excludeChr]
tileSize <- 500
featureDF <- lapply(seq_along(chromLengths), function(x){ DataFrame(seqnames = names(chromLengths)[x], idx = seq_len(trunc(chromLengths[x])/tileSize + 1)) }) %>% Reduce("rbind", .)
featureDF$start <- (featureDF$idx - 1) * tileSize
Then you can filter for variable tiles by substituting featureDF
for tile.mtx@elementMetadata
:
var.tile.mtx <- tile.mtx[match(var.tiles, featureDF), match(archr.obj$cellNames, colnames(tile.mtx))]
This probably isn't the cleanest way to get around this issue but it worked for me!
Hi @jcurrie7, your code worked for me. Thanks for your help!
Hi @jcurrie7 , were you able to get the downstream CellSpace command line to work? Mine is seg faulting.
This is my output:
CellSpace -output data/CellSpace_embedding-var_tiles -cpMat data/CellSpace_cell_by_tile-counts.mtx -peaks data/CellSpace_var_tiles.fa CellSpace Arguments: dim: 30 ngrams: 3 k: 8 sampleLen: 150 exmpPerPeak: 20 epoch: 50 margin: 0.05 bucket: 2000000 label: '__label__' lr: 0.01 maxTrainTime: 8640000 negSearchLimit: 50 maxNegSamples: 10 p: 0.5 initRandSd: 0.001 batchSize: 5 saveIntermediates: 'final' thread: 10. Start to initialize starspace model. Number of words (8-mers) in dictionary: 32896 Number of labels in dictionary: 77141 Reading 9964 peak sequences from 'data/CellSpace_var_tiles.fa' Segmentation fault (core dumped)
> var.tile.mtx <- tile.mtx[match(var.tiles, tile.mtx@elementMetadata), match(archr.obj$cellNames, colnames(tile.mtx))] Error in h(simpleError(msg, call)) : error in evaluating the argument 'i' in selecting a method for function '[': error in evaluating the argument 'table' in selecting a method for function 'match': no slot of name "elementMetadata" for this object of class "dgCMatrix"