zakieh-tayyebi / CellSpace

Scalable sequence-informed embedding of single-cell ATAC-seq data with CellSpace
MIT License
30 stars 5 forks source link

Error found when preparing "var.tile.mtx" in the tutorial #6

Open kerenzhou062 opened 3 months ago

kerenzhou062 commented 3 months ago

> var.tile.mtx <- tile.mtx[match(var.tiles, tile.mtx@elementMetadata), match(archr.obj$cellNames, colnames(tile.mtx))] Error in h(simpleError(msg, call)) : error in evaluating the argument 'i' in selecting a method for function '[': error in evaluating the argument 'table' in selecting a method for function 'match': no slot of name "elementMetadata" for this object of class "dgCMatrix"

wgao688 commented 2 months ago

I am also having this same issue. It appears to be an ArchR issue -- for some reason the tile.mtx@elementMetadata is NULL when it should return the tile coordinates.

jcurrie7 commented 2 months ago

Using the code in addTileMatrix() and .addTileMat() from MatrixTiles.R in the ArchR source code, you can reconstruct the dataframe storing metadata for all genomic tiles as follows:

input <- archr.obj ArrowFiles <- getArrowFiles(input) chromSizes = getChromSizes(input) blacklist = getBlacklist(input) chromLengths <- end(chromSizes) names(chromLengths) <- paste0(seqnames(chromSizes)) blacklist <- split(blacklist, seqnames(blacklist)) excludeChr = c("chrM", "chrY", "chrX") # adjust according to your preferences chromLengths <- chromLengths[names(chromLengths) %ni% excludeChr] tileSize <- 500 featureDF <- lapply(seq_along(chromLengths), function(x){ DataFrame(seqnames = names(chromLengths)[x], idx = seq_len(trunc(chromLengths[x])/tileSize + 1)) }) %>% Reduce("rbind", .) featureDF$start <- (featureDF$idx - 1) * tileSize

Then you can filter for variable tiles by substituting featureDF for tile.mtx@elementMetadata:

var.tile.mtx <- tile.mtx[match(var.tiles, featureDF), match(archr.obj$cellNames, colnames(tile.mtx))]

This probably isn't the cleanest way to get around this issue but it worked for me!

wgao688 commented 2 months ago

Hi @jcurrie7, your code worked for me. Thanks for your help!

wgao688 commented 2 months ago

Hi @jcurrie7 , were you able to get the downstream CellSpace command line to work? Mine is seg faulting.

This is my output: CellSpace -output data/CellSpace_embedding-var_tiles -cpMat data/CellSpace_cell_by_tile-counts.mtx -peaks data/CellSpace_var_tiles.fa CellSpace Arguments: dim: 30 ngrams: 3 k: 8 sampleLen: 150 exmpPerPeak: 20 epoch: 50 margin: 0.05 bucket: 2000000 label: '__label__' lr: 0.01 maxTrainTime: 8640000 negSearchLimit: 50 maxNegSamples: 10 p: 0.5 initRandSd: 0.001 batchSize: 5 saveIntermediates: 'final' thread: 10. Start to initialize starspace model. Number of words (8-mers) in dictionary: 32896 Number of labels in dictionary: 77141 Reading 9964 peak sequences from 'data/CellSpace_var_tiles.fa' Segmentation fault (core dumped)