quadbio / VoxHunt

VoxHunt: Resolving human brain organoid heterogeneity through single-cell genomic comparison to spatial brain maps
https://quadbio.github.io/VoxHunt/
MIT License
41 stars 5 forks source link

Error using voxel_map function #7

Closed finjen closed 3 years ago

finjen commented 3 years ago

Hi, I am trying your code to run voxel_map as such:

vox_map <- voxel_map( mydata, stage='E13', group_name = 'leiden_cluster', genes_use = regional_markers )

mydata is my Seurat object, in which I added the leiden clustering information into the metadata.

However, I am receiving the following error: Error: Must subset rows with a valid subscript vector. ℹ Logical subscripts must match the size of the indexed input. x Input has size 41650 but subscript !duplicated(x, fromLast = fromLast, ...) has size 0.

To be frank, I do not see where this error is coming from. Would it be possible to receive some input on this?

Thanks.

joschif commented 3 years ago

Hi @finjen,

I'm also not entirely sure what could be the problem here. Could you provide with the cluster cell counts counts (table(mydata$leiden_cluster), and the cell names(head(colnames(mydata)))? Also, you can try do get the expression matrix from the seurat object with GetAssayData(), transpose it (t()) and then run voxel_map() on that, providing the vector of cluster assignments in the groups argument.

Cheers, Jonas

finjen commented 3 years ago

Hi Jonas, thank you for the quick reply. I tried running it with the transposed matrix using the vector of cluster assignments in the groups, but the output is the very same error. These are the outputs for leidencluster and the colnames: table(mydata$leiden_cluster): 0 1 2 3 4 5 6 7 8 9 10 11 12 13 153 153 148 116 82 67 57 45 43 41 37 32 22 9

head(colnames(mydata)): [1] "AAACGCTTCGGACTGC-1" "AAAGGATTCGGATACT-1" "AACGAAATCATAAGGA-1" "AACTTCTAGATCGCCC-1" "AAGCATCGTAGCCCTG-1" [6] "AAGCATCTCTACACTT-1"

Thank you.

joschif commented 3 years ago

Ok, and just to make sure, are you using the most recent version of VoxHunt? You can find out by calling sessionInfo(). Also, maybe provide the gene names as well (head(rownames(mydata)).

finjen commented 3 years ago

The version I am using is voxhunt_0.9.2 (I think that is the newest?) head(rownames(mydata)) gives me the gene names: [1] "Xkr4" "Gm1992" "Gm37381" "Rp1" "Mrpl15" "Lypla1"

finjen commented 3 years ago

I just ran the code again, without change, and receive now the following error:

Error in voxel_map(mydata = Matrix::Matrix(expr_mat, sparse = T), stage = stage, : argument "object" is missing, with no default

The code is still: vox_map <- voxel_map( mydata, stage="E13", group_name= 'leiden_cluster', genes_use = regional_markers )

object = mydata results in "unused argument (object = mydata) error.

joschif commented 3 years ago

Ah ok, so VoxHunt expects human gene symbols (all caps, like XKR4), because I wrote it for human organoids. You can find an ortholog mapping here.

finjen commented 3 years ago

Ah, of course.. I changed the gene names now to human gene symbols. Thank you. However, the error now is still:

Error in voxel_map(mydata = Matrix::Matrix(expr_mat, sparse = T), stage = stage, : argument "object" is missing, with no default

joschif commented 3 years ago

This is because the first argument of voxel_map() is object, so I belive the code voxel_map(object = Matrix::Matrix(expr_mat, sparse = T)) shoudl work.

finjen commented 3 years ago

I tried that, also as a second option with the transposed matrix as such: vox_map <- voxel_map(Matrix::Matrix(mydata_t, sparse=T), stage="E13", groups=new_leiden, genes_use = regional_markers)

But I still keep receiving this error: Error: Must subset rows with a valid subscript vector. ℹ Logical subscripts must match the size of the indexed input. x Input has size 49073 but subscript !duplicated(x, fromLast = fromLast, ...) has size 0.

When I run rlang::last_error() the output is this:

<error/vctrs_error_subscript_size> Must subset rows with a valid subscript vector. ℹ Logical subscripts must match the size of the indexed input. x Input has size 49073 but subscript !duplicated(x, fromLast = fromLast, ...) has size 0. Backtrace:

  1. voxhunt::voxel_map(...)
  2. voxhunt:::voxel_map.default(...)
  3. generics:::intersect.default(inter_genes, genes_use)
  4. base::intersect(x, y, ...)
  5. base::unique.data.frame(y[match(as.vector(x), y, 0L)])
  6. tibble:::[.tbl_df(...)
    1. tibble:::tbl_subset_row(xo, i = i, i_arg)
    2. tibble:::vectbl_as_row_index(i, x, i_arg)
    3. tibble:::vectbl_as_row_location(i, nr, i_arg, assign)
    4. vctrs::vec_as_location(i, n)
    5. vctrs:::stop_indicator_size(...)
joschif commented 3 years ago

It looks like some of the row- or colnames of the input matrix don't match the expected input. Can you maybe post the row and colnames of mydata_t? Also you can have a look at the ABA data that is loaded by VoxHunt to check if the gene names match the ones expected by VoxHunt: DATA_LIST[['E13']]$matrix

joschif commented 3 years ago

Since the Allen brain atlas only provides the expression measurements for ~2000 genes, it's also possible that the gene set you selected is not measured in the ABA at all. I would therefore recommend to do feature selection through VoxHunt (structure_markers()) or intersect your feature set with the genes measures in the timepoint you want to look at (colnames(DATA_LIST[['E13']]$matrix))

finjen commented 3 years ago

head(colnames(mydata_t), 20) [1] "XKR4" "GM1992" "GM37381" "RP1" "MRPL15" "LYPLA1" "TCEA1"
[8] "RGS20" "GM16041" "ATP6V1H" "OPRK1" "RB1CC1" "4732440D04RIK" "ST18"
[15] "PCMTD1" "GM26901" "SNTG1" "RRS1" "AC009879.3" "2610203C22RIK"

head(rownames(mydata_t)) [1] "AAACGCTTCGGACTGC-1" "AAAGGATTCGGATACT-1" "AACGAAATCATAAGGA-1" "AACTTCTAGATCGCCC-1" "AAGCATCGTAGCCCTG-1" [6] "AAGCATCTCTACACTT-1"

Also, among my total 20615 genes almost 1400 are also found in the ABA gene list (88%).

joschif commented 3 years ago

And the regional_markers you are using, are they also among the genes detected in the ABA?

finjen commented 3 years ago

My understanding was that structure_markers() selects markers from the ABA dataset. I wouldn't know right now how to apply it to my dataset based on the arguments it takes.. Or am I misunderstanding your comment?

joschif commented 3 years ago

Yes, your understanding is right. I was talking about the genes passed to the voxel_map() function with the genes_use argument. Those are the genes that are used for correlation and selecting structure-specific genes here results in better contrast in the spatial maps. Also, the genes used for correlation need to be measured in the ABA so if you select correlation features differently, you need to make sure that that is the case. I was just asking because this would be a possible reason for the error.

joschif commented 3 years ago

How does regional_markers look like?

finjen commented 3 years ago

regional_markers looks like this:

regional_markers

A tibble: 51,000 x 9

gene group avg_exp fc auc pval padj prcex_self prcex_other

1 SLC6A11 caudal hypothalamus 2.86 1.57 0.835 1.14e-254 1.39e-252 98.9 60.6 2 GPX3 caudal hypothalamus 1.59 1.45 0.824 0. 0. 70.3 8.65 3 BAIAP3 caudal hypothalamus 2.17 1.75 0.813 0. 0. 76.0 19.3 4 DLK1 caudal hypothalamus 2.40 1.78 0.796 2.58e-292 3.98e-290 76.2 28.0 5 SPARC caudal hypothalamus 2.92 0.770 0.783 4.14e-171 1.72e-169 100 95.1 6 ACHE caudal hypothalamus 3.43 1.23 0.782 1.65e-170 6.66e-169 98.7 84.9 7 ECE2 caudal hypothalamus 1.68 1.12 0.781 3.10e-234 3.10e-232 78.8 33.8 8 VAT1L caudal hypothalamus 3.11 1.27 0.778 1.67e-166 6.62e-165 96.8 79.2 9 ECEL1 caudal hypothalamus 2.04 1.36 0.773 4.17e-209 3.38e-207 76.4 37.5 10 ARL10 caudal hypothalamus 1.89 1.22 0.765 4.77e-192 2.79e-190 74.9 39.0 # … with 50,990 more rows
joschif commented 3 years ago

Ah ok, so it should be a vector with genes to use for correlation, not a data frame.

joschif commented 3 years ago

Did this solve your issue?

finjen commented 3 years ago

So, I reran this code: regional_markers <- structure_markers('E18') %>% group_by(group) %>% top_n(10, auc) %>% {unique(.$gene)} head(regional_markers)

And sometimes, I get the dataframe as an output, sometimes a list of genes. I am not sure why there is this variability, but as I can rerun until I get the vector and then can continue, it is fine for now. Thank you for your help with this.

joschif commented 3 years ago

As far as I know, there should not be any variability here, if you run the full block it should output a vector with genes. If you happen to get a dataframe, it for some reason probably did not run the last line. In that case you can obtain the vector of genes by getting the unique entries of the gene column (which is what the last line does).