martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
105 stars 13 forks source link

munge-gs type error #40

Closed parkjooyoung99 closed 1 year ago

parkjooyoung99 commented 1 year ago

Hello, I am trying to generate .gs file with 'MAGMA' output and having a trouble.

I sort 'MAGMA' output by zstat with code cut -f 1 prostate_GCST90011808_zscore_file.tsv > gene_symbol.txt paste gene_symbol.txt prostate_GCST90011808.genes.out > test.tsv awk '{print $1, $9}' test.tsv | sort -rn -k 2 | sed 's/ /\t/g' > prostate_GCST90011808_zscorefile.tsv
which gives me output of image

With this 'prostate_GCST90011808_zscorefile.tsv', munge-gs always give me the error image

What would be the problem,,,,?

I have added my zscorefile converted to 'txt' . Thank you!

prostate_GCST90011808_zscorefile.txt

KangchengHou commented 1 year ago

Thanks for your clear description. There are multiple formatting issue in prostate_GCST90011808_zscorefile.txt.

  1. The header should contain GENE, TRAIT, for example:
    GENE    BMI    HEIGHT
    OR4F5   0.001  0.01
    DAZ3    0.01   0.001
  2. There are NA entries in GENE column (I recommend to remove them before hand)
  3. There is a row (~11093-th rows) with GENE, ZSTAT (maybe some concatenating issue there)

We have added more checking to the zscore / pval files in the master branch. But perhaps it is easier for you for fix these formatting issue in zscorefile before feeding that into munge-gs.

parkjooyoung99 commented 1 year ago

Thank you so much. Following your instruction 1 and 2 helped.