martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
105 stars 13 forks source link

scDRS CLI scdrs munge-gs #31

Closed Alina-Song closed 1 year ago

Alina-Song commented 1 year ago

Hi,

  1. I only have one set of genes for one trait, when I create .gs file from pval_file/zscore_file using scDRS CLI scdrs munge-gs, I found that for a group of genes with both pvalue and score, the results obtained by inputting zscore-file (format1) and pval-file (format2) respectively were quite different. For format 1, only genes with a positive zscore are returned (weight = zscore) , while for format 2, the weights are recalculated and genes with smaller p-values were returned. I don't know how to choose?

    format1:

    scdrs munge-gs \
      --out-file gene1.gs \
      --zscore-file muco_z.tsv \
      --weight zscore \
      --n-max 1000

    format2:

    scdrs munge-gs \
      --out-file gene2.gs \
      --pval-file muco_p.tsv \
      --weight zscore \
      --n-max 1000
  2. If I obtain a set of genes by taking the intersection of multiple methods, can I directly make all genes weight 1, or do not add weight?

martinjzhang commented 1 year ago

Hi,

Thank you for the question. The scDRS munge-gs function uses one-sided z-scores because that's also what MAGMA does. As a result, genes with large negative z-scores are considered highly non-significant (p-values close to 1). I suggest using the p-value file as input. Alternatively, if you use a new z-score file with the absolute values of the original z-scores, you should get a similar .gs file as using the p-value file.

For creating gene sets with multiple methods, scDRS supports both formats (weights 1 or without weights, see file formats). In this case, it is probably easier to skip the munge-gs file and create the .gs file from scratch: the first row should be TRAIT\tGENESET and the second row should be <trait_name>\t<gene1>,<gene2>,<gene3>, where <xx> represents variables.

Please let us know if you have further questions.