satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.31k stars 919 forks source link

Error when making seurat object: No cell names (colnames) names present in the input matrix #6568

Closed aclynch160 closed 2 years ago

aclynch160 commented 2 years ago

Hello!

I am new to Seurat, and running into some problems while working on the GSE109125 dataset, taken from GEO. Unlike most datasets, this does not contain seperate matrix, barcode, and features files; instead, it is all incorporated into one CSV.

When attempting to create a seurat object, I was using the following code:

Gcount <- CreateSeuratObject(counts = "gcount.rna", assay = "RNA", colnames = "CNs", names.delim = ",", meta.data = NULL)
Gcount

I recieved the following error:

Error in CreateAssayObject(counts = counts, min.cells = min.cells, min.features = min.features, : No cell names (colnames) names present in the input matrix

However, I can take a look at the matrix, and it seems to already have these column names. When I run the following code:

gcount.rna <- read_csv(file = "G_count.csv", col_names = TRUE )
CNs <- colnames(gcount.rna)
view(CNs)

I can see a clear output, with the correct column labels:

2 B.Fem.Sp#1 3 B.Fo.Sp#1 4 B.Fo.Sp#2 5 B.Fo.Sp#3

and so on.

Has anyone dealt with this before? I noted other similar issues in past submissions, but nothing completely identical.

Trying to force the listed CN's by the method in my initial code does not seem to work, despite the recognition that they are column names; neither does using Header=true, or names.field=1. I'm an absolute beginner, though; if anyone would share their wisdom, I'd greatly appreciate it!

samuel-marsh commented 2 years ago

Hi,

First as FYI, this doesn't appear to be a single cell dataset, it appears to be ULI bulk RNA-seq. However, I don't seem to have the same error. I will add though that to create Seurat object from this data does require moving the gene symbols to rownames instead of first column.

What version of Seurat are you using?

# both of these work for me 
test <- data.table::fread("~/Downloads/GSE109125_Gene_count_table.csv.gz", data.table = F)

test <- test %>%
     column_to_rownames("Gene_Symbol")

test2 <- read_csv("~/Downloads/GSE109125_Gene_count_table.csv.gz", col_names = T)

test2 <- test2 %>%
     column_to_rownames("Gene_Symbol")

test_seu <- CreateSeuratObject(counts = test)
test_seu2 <- CreateSeuratObject(counts = test2)

Though again be aware that this is not single cell data so I don't think Seurat is really pipeline you probably want to run for analysis.

Best, Sam

aclynch160 commented 2 years ago

Sam,

Thanks for the quick reply! Your fix worked for me. Intuitively, I can't seem to understand why the column names need to be swapped, but I'll remember it for the future- once I properly come back to Seurat with actual Sc data. That was a big oversight on my part- thanks for pointing it out! I'll look into other packages to use.

I am currently using Seurat 4.20!

samuel-marsh commented 2 years ago

Hi,

So it's not swapping the column names but moving one of the columns to the row names. Seurat expects the input data to have cells as column names and features as row names. In this case of this dataset the features aren't in row names but in the first column of the matrix. So in order for Seurat to appropriate create object we need to move remove that column and make it's values into the rownames.

Best, Sam

aclynch160 commented 2 years ago

Got it!

That makes much more sense.

Thank you for the help!

On Tue, Oct 18, 2022 at 9:55 AM Samuel Marsh @.***> wrote:

Hi,

So it's not swapping the column names but moving one of the columns to the row names. Seurat expects the input data to have cells as column names and features as row names. In this case of this dataset the features aren't in row names but in the first column of the matrix. So in order for Seurat to appropriate create object we need to move remove that column and make it's values into the rownames.

Best, Sam

aclynch160 commented 2 years ago

Hi @samuel-marsh !

I'm running into a similar problem, with another dataset. I'm adding to this thread just because they're so similar, yet the solutions are going to be different.

This time, I am using the dataset GSE176293, which is actually the correct kind of data. This time, the table format is gene ID's as the column names, and cell barcodes as the rownames, i.e this:

      GeneA GeneB GeneC GeneD GeneE
Barcode A 0 1 2 3 4 
Barcode B 4 3 2 1 0
Barcode C 1 2 3 4 0 

Unfortunately, the file runs into the same problem as before:

Error in CreateAssayObject(counts = counts, min.cells = min.cells, min.features = min.features, : No cell names (colnames) names present in the input matrix

When I attempt to use the fix you suggested before:

test <- data.table::fread("~/Downloads/GSE109125_Gene_count_table.csv.gz", data.table = F)

test <- test %>%
     column_to_rownames("Gene_Symbol")

I am able to make a seurat object. Unfortunately, the table is not oriented as before, so all points on downstream analysis are labelled by barcode as opposed to gene names. I attempted to use the basic transpose option,

t(data)

to reformat things, but it appears to also transpose the row ID's in a way that makes the file unusable as a seurat object. Manually transposing things in excel doesn't seem to work, either, as the file is too large.

Do you have any suggestions as to a fix?

samuel-marsh commented 2 years ago

what you need to do for that new sample is:

# load the same way
test <- data.table::fread("...", data.table = F) %>%
     column_to_rownames("V1")

# transpose and make to dgCMatrix
mat_trans <- as.sparse(t(mat))

# make seurat:
test_seu <- CreateSeuratObject(mat_trans)

Best, Sam

aclynch160 commented 2 years ago

That works perfectly- thanks again for all the help!