scDRS CLI scdrs munge-gs

martinjzhang / scDRS

Single-cell disease relevance score (scDRS)

MIT License

105 stars 13 forks source link

Hi,

I only have one set of genes for one trait, when I create .gs file from pval_file/zscore_file using scDRS CLI scdrs munge-gs, I found that for a group of genes with both pvalue and score, the results obtained by inputting zscore-file (format1) and pval-file (format2) respectively were quite different. For format 1, only genes with a positive zscore are returned (weight = zscore) , while for format 2, the weights are recalculated and genes with smaller p-values were returned. I don't know how to choose?

format1:
```
scdrs munge-gs \
  --out-file gene1.gs \
  --zscore-file muco_z.tsv \
  --weight zscore \
  --n-max 1000
```
format2:
```
scdrs munge-gs \
  --out-file gene2.gs \
  --pval-file muco_p.tsv \
  --weight zscore \
  --n-max 1000
```
If I obtain a set of genes by taking the intersection of multiple methods, can I directly make all genes weight 1, or do not add weight?

Hi,

Thank you for the question. The scDRS munge-gs function uses one-sided z-scores because that's also what MAGMA does. As a result, genes with large negative z-scores are considered highly non-significant (p-values close to 1). I suggest using the p-value file as input. Alternatively, if you use a new z-score file with the absolute values of the original z-scores, you should get a similar .gs file as using the p-value file.

For creating gene sets with multiple methods, scDRS supports both formats (weights 1 or without weights, see file formats). In this case, it is probably easier to skip the munge-gs file and create the .gs file from scratch: the first row should be TRAIT\tGENESET and the second row should be <trait_name>\t<gene1>,<gene2>,<gene3>, where <xx> represents variables.

Please let us know if you have further questions.

martinjzhang / scDRS

scDRS CLI scdrs munge-gs #31