Closed Dazcam closed 3 years ago
Thanks for making us aware of the issue, investigating...
Hi @Dazcam,
Looking at the changes in Seurat, the two things I can think of is 1) we find clustifyr performs worse with sctransform than other normalization methods. 2) by default Seurat 4 might be returning 3000 variable genes instead of the previous 2000, and 2000 was probably already too large a number (and setting to NULL would use all genes, which would produce even worse results). Can you try setting query_genes to something like query_genes = VariableFeatures(seurat.batch1)[1:1000]
?
If none of the above explains the errors, any chance you can share the 2 different versions of objects with us?
Many thanks, Rui
Hi Rui,
Many thanks for looking into this. Off the top of my head I have a feeling there were 3000 genes when running with Seurat 4 but I will need to check this. It’s interesting that including more genes hampers clustifyr’s performance.
Unfortunately I’m off on holiday until Jan 4th so won’t be able to test your suggestions (or send test data) until then.
I will look into this as soon as return and report back.
Thanks again,
Darren
Hello again Rui,
I have managed to have a look at this again and the issue was caused by the number of variable genes fed into clustfyr
. When I lowered the variable genes down to 1000, the cluster assignments made much more sense.
I see you have updated the README and mention that scTransform may not be ideal with clustfyr, I will keep this in mind. We haven't settled on the final normalisation method that we're going to use for our analysis yet. I ran this most recent analysis using the Seurat default normalisation method.
Thanks again for help with this.
Best,
Darren
Hi there,
I would like to ask if
clustifyr
is compatible with data generated using Seurat 4?The reason I ask it that I have ran
clustifyr
with one of the suggested reference datasets that you provide (ref_cortex_dev
), and when I try to visualise my Seurat 4 generated clusters by grouping onres$type
, only 5 or 7 of the 47 cell types identified in the reference dataset are mapped onto my cells and they don't really make any sense (depending on whether I setquery_genes
to NULL orVariableFeatures(seurat.batch1)
).Whilst I understand that the degree to which cell ID correlates between datasets depends on the similarity of their cell types, tissue of origin, data quality etc., I think it's unlikely that only 5-7 of the 47 cell cell types would map over.
There have been some minor, but perhaps important, changes made to the underlying code of some of the key Seurat 4 functions. In particular, the
FindMarkers
function now reports log fold differential expression values to base 2 instead of to the natural log. So it would be useful to clarify whether the changes made to Seurat directly impact theclustifyr
output.Any advice you could offer on this matter would be greatly appreciated.
Many Thanks,
Darren
UPDATE: When I ran the same analysis as above using Seurat 3 parameters 14 cell-types mapped over in a manner that makes more sense biologically suggesting that
clustifyr
is not compatible with Seurat 4 output.