smorabit / hdWGCNA

High dimensional weighted gene co-expression network analysis
https://smorabit.github.io/hdWGCNA/
Other
316 stars 31 forks source link

MetacellsByGroups group.by argument #151

Closed behyag closed 8 months ago

behyag commented 8 months ago

hi Sam, thanks a lot for this very useful package! Q. about the MetacellsByGroups() argument group.by ; is it OK not to include sample as a factor? I have already run and added harmony to my data and also used reduction=harmony in this function MetacellsByGroups(). because when I add sample as a grouping factor many of the groups fail to pass the min.cell cutoff because they're too small! but since I already corrected for sample difference and the metacell construction was done with knn using harmony, I wonder if it makes sense to NOT include sample ID but other factors of interest to group.by= ?

many thanks for ur help & advice

best, B

smorabit commented 8 months ago

Thanks for the question. I think that it is best practice if possible however it is not strictly required to include the sample ID in the group.by for MetacellsByGroups. Like you said some groups can easily be skipped over especially for small cell populations. For example in the hdWGCNA paper, in Figure 6 we showed a consensus network analysis of microglia across 3 datasets. Microglia are usually not highly abundant in brain snRNA-seq datasets. So here we just grouped them by the sequencing batch and the dataset of origin rather than the sample ID. Every dataset is different so I am not sure what exactly to recommend to you but ultimately providing the sample ID is not strictly required. Hope this helps!

behyag commented 8 months ago

hi Sam, thanks a lot for the response! just to double check, so the fact that first I ran harmony with vars_use = "sample.ID" and then used reduction=harmony in the function MetacellsByGroups() to ensure that metacells are made based on knn using harmonized PCA, may perhaps help justify not using sample.ID for the "group.by" argument in the MetacellsByGroups() ??
At least that's how I interpret it! Also, I don't have different sequencing batches, or datasets of origin. so it's one dataset with samples multiplexed and sequenced together so there's little technical variability to begin with. thanks for your advice! best, B

smorabit commented 8 months ago

so the fact that first I ran harmony with vars_use = "sample.ID" and then used reduction=harmony in the function MetacellsByGroups() to ensure that metacells are made based on knn using harmonized PCA, may perhaps help justify not using sample.ID for the "group.by" argument in the MetacellsByGroups() ??

Running harmony is unrelated to the MetacellsByGroups group.by parameter.

it's one dataset with samples multiplexed and sequenced together so there's little technical variability to begin with.

Do you have some other condition of interest like disease or genotype that you could group by?

behyag commented 8 months ago

yes, in fact I'd rather grouped by condition x cell-type . thanks for the clarification regarding the MetacellsByGroups group.by parameter!

best, /B