martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
109 stars 14 forks source link

Please help with scDRS for the plant Arabidopsis thaliana #56

Closed bitcometz closed 1 year ago

bitcometz commented 1 year ago

hi, thanks for your great software scDRS.

Now I have some scRNA-seq data of Arabidopsis thaliana and want to do the scDRS analysis.

I found GWAS data from https://aragwas.1001genomes.org/#/studies

But I cannot find the sumstat files, I can only find these results: image

My question is that if I want to do the scDRS analysis, then I have to prepare the sumstats file first, am I right? Could you give some suggestions ? Thanks !!!

martinjzhang commented 1 year ago

Hi,

Thank you for your interest in our software. scDRS needs a set of disease-relevant genes to score cells in the scRNA-seq data. This list of genes is formatted as the .gs file and is supplemented as input to scdrs compute-score.

We have recommended using MAGMA + GWAS sumstats to create this .gs file. If you don't have GWAS sumstats, but only significant associations, you can probably also create a .gs file using this list of trait-associated genes. In .gs file, you can specify gene weights, which are the MAGMA z-scores in our recommendation. If you have z-scores for the genes computed in some other fashion, they can also be used as gene weights. Otherwise, it is fine to just use uniform weights of 1. The gene weights need to be positive.

bitcometz commented 1 year ago

Thanks for your reply !!! I will have a try !!!

bitcometz commented 1 year ago

@martinjzhang ,thanks for your help ! We have used about ~6M snps from 2000 samples to calculate the LD background with plink to get the bfile. Then , we used the pvalue from this study : https://aragwas.1001genomes.org/#/study/727 (this study provide ~5M snps with no significant SNPs,do I need to remove those non-significant SNPs ? ) to get sumstat files with this command:

magma --bfile step2_ref/aragwas \
    --gene-annot step1_genes_annot/727.genes.annot  \
    --pval step3_study_data/study_727_snp.tsv  \
    'use=rs_id,pvalue' ncol=macs \
    --out step4_study_magma_genes_out/727

However, I got only few genes: image

I saw in your document that number for genes is usually between 100 and a few hundred, which is normal. Do you have any good suggestions regarding our current situation?

Thanks !!!

martinjzhang commented 1 year ago

Hi @bitcometz, if your GWAS is severely underpowered. I am not sure if you can create a set of high-quality disease genes.

Hi @KangchengHou , any suggestions regarding running MAGMA?

KangchengHou commented 1 year ago

@bitcometz Not sure if I completely understand the issue. When running MAGMA, it is recommended to provide the p-value for all SNPs (regardless of significant or not). And you should be able to obtain the gene-level p-value for all genome-wide genes. Can you clarify what you mean by you only get a few genes?

2000 samples do seem to be underpowered to detect high-quality GWAS genes. But the power can be empirically examined by looking at the MAGMA p-value distribution

bitcometz commented 1 year ago

@martinjzhang @KangchengHou ,thanks for your reply !!!

Can you clarify what you mean by you only get a few genes?

magma --bfile step2_ref/aragwas \
    --gene-annot step1_genes_annot/727.genes.annot  \
    --pval step3_study_data/study_727_snp.tsv  \
    'use=rs_id,pvalue' ncol=macs \
    --out step4_study_magma_genes_out/727

I run this command and then I got few genes in the result file "--out step4_study_magma_genes_out/727"

I clarify that I used other Arabidopsis samples, around 2000, to calculate the LD background, which refers to the linkage of SNPs.

I recheck the MAGMA, I reset the smaple for it, in this study, only 200 samples. So I set the sample number to 200. (before: ncol=macs,In my previous settings, this parameter used the number of times each SNP appeared in the sample, which was incorrect. ) After I modified the parameters, I obtained more than 900 genes from MAGMA, compared to only 10 genes before. Currently, I think it should be relatively normal. I noticed that scDRS has a maximum limit of 1000 genes.

Am I right ? Thanks again.

martinjzhang commented 1 year ago

Hi @bitcometz

scDRS can work with any number of genes.

Can you clarify "I obtained more than 900 genes from MAGMA"? Is this the total number of genes or the MAGMA significant genes? If that's all the genes you got from MAGMA, I suggest using a set of top genes that you believe are truly relevant to the trait.

KangchengHou commented 1 year ago

How many genes do you have in total? MAGMA should give you p-values for each of the gene you provide that have SNPs. Looking at the file of step1_genes_annot/727.genes.annot should give some additional clue. Can you check number of lines for that file? Can you also attach MAGMA log?

bitcometz commented 1 year ago

@KangchengHou @martinjzhang , here is the log file :

Welcome to MAGMA v1.05 (linux)
Using flags:
        --bfile step2_ref/aragwas
        --gene-annot step1_genes_annot/727.genes.annot
        --pval step3_study_data/study_727_snp.tsv use=rs_id,pvalue N=227
        --out step4_study_magma_genes_out/727_N227

Start time is 10:52:40, Tuesday 11 Apr 2023

Loading PLINK-format data...
Reading file step2_ref/aragwas.fam... 2029 individuals read
Reading file step2_ref/aragwas.bim... 4072855 SNPs read
Preparing file step2_ref/aragwas.bed... 
Reading SNP p-values from file step3_study_data/study_727_snp.tsv... 
        detected 9 variables in file
        using variable: rs_id (SNP id)
        using variable: pvalue (p-value)
        read 4932458 lines from file, containing valid SNP p-values for 3744118 SNPs in data (75.91%)

Loading gene annotation from file step1_genes_annot/727.genes.annot... 
        893 gene definitions read from file
        found 893 genes containing SNPs in genotype data

Starting gene analysis... 
        using model: SNPwise-sum
        writing gene analysis results to file step4_study_magma_genes_out/727_N227.genes.out
        writing intermediate output to file step4_study_magma_genes_out/727_N227.genes.raw

End time is 10:53:14, Tuesday 11 Apr 2023 (elapsed: 00:00:34)

Thanks for your reply !!!

Here's a important question I have: Regarding the SNP and gene annotation files for the first step for MAGMA, do we need to do it for every GWAS study (input SNPs from each GWAS to generate the SNP-GENE annotation file, which we are doing), or can we use a fixed result based on a general SNP file so that different GWAS studies can use the same file? Arabidopsis thaliana has about 27,416 genes, but we only annotated 900 genes using the SNPs from this GWAS study. There may be some issues here as we only used the SNPs from this GWAS study to annotate the SNPs and genes.

KangchengHou commented 1 year ago

We recommend to generate annotation for each GWAS study (as this does not take much time).

Arabidopsis thaliana has about 27,416 genes, but we only annotated 900 genes using the SNPs from this GWAS study.

If this GWAS study only have SNPs for 900 genes, then obtaining MAGMA scores for these 900 genes is all you can do. But if you have SNPs for all genome-wide genes, we recommend you obtain MAGMA scores for all these genes. And scDRS will handle the top 1,000 genes selection part for you.

bitcometz commented 1 year ago

@KangchengHou thanks for your reply !!! I used the following commands: image

And I noticed for MAGMA, there a paramter to include more genes :

For example, ‘--annotate window=5,1.5’ would set a 5kb upstream and 1.5kb downstream window.

So should I change the paramters to include more genes in my project ?

Thanks !!!

KangchengHou commented 1 year ago

in our experiment using a 10kb window works fine (--annotate window=10,10). Can you also show the log of magma --annotate? it is still strange that you only have 893 genes

bitcometz commented 1 year ago

@KangchengHou , thanks for your reply.

I recheck the pipline, I found that I got 893 genes because I used the only significant genes to do the SNP&Gene annotation file.

Then I used all the SNPs (about 4m SNPs), I found almost all the genes have been annotated:

Welcome to MAGMA v1.10 (linux)
Using flags:
    --annotate window=5,5
    --snp-loc ./study727.bim
    --gene-loc ./Ath.loc
    --out /public/home/WT_liuzm/Data/PlantSingleCell/GWAS/work/study727/727

Start time is 15:36:59, Wednesday 12 Apr 2023

Starting annotation...
Reading gene locations from file ./Ath.loc... 
    adding window: 5000bp
    WARNING: on line 27207, chromosome code 'C' not recognised; skipping gene (ID = ATCG00020)
    WARNING: on line 27208, chromosome code 'C' not recognised; skipping gene (ID = ATCG00040)
    WARNING: on line 27209, chromosome code 'C' not recognised; skipping gene (ID = ATCG00050)
    WARNING: on line 27210, chromosome code 'C' not recognised; skipping gene (ID = ATCG00065)
    WARNING: on line 27211, chromosome code 'C' not recognised; skipping gene (ID = ATCG00070)
    WARNING: on line 27212, chromosome code 'C' not recognised; skipping gene (ID = ATCG00080)
    WARNING: on line 27213, chromosome code 'C' not recognised; skipping gene (ID = ATCG00120)
    WARNING: on line 27214, chromosome code 'C' not recognised; skipping gene (ID = ATCG00130)
    WARNING: on line 27215, chromosome code 'C' not recognised; skipping gene (ID = ATCG00140)
    WARNING: on line 27216, chromosome code 'C' not recognised; skipping gene (ID = ATCG00150)
    WARNING: on line 27217, chromosome code 'C' not recognised; skipping gene (ID = ATCG00160)
    WARNING: on line 27218, chromosome code 'C' not recognised; skipping gene (ID = ATCG00170)
    WARNING: on line 27219, chromosome code 'C' not recognised; skipping gene (ID = ATCG00180)
    WARNING: on line 27220, chromosome code 'C' not recognised; skipping gene (ID = ATCG00190)
    WARNING: on line 27221, chromosome code 'C' not recognised; skipping gene (ID = ATCG00210)
    WARNING: on line 27222, chromosome code 'C' not recognised; skipping gene (ID = ATCG00220)
    WARNING: on line 27223, chromosome code 'C' not recognised; skipping gene (ID = ATCG00270)
    WARNING: on line 27224, chromosome code 'C' not recognised; skipping gene (ID = ATCG00280)
    WARNING: on line 27225, chromosome code 'C' not recognised; skipping gene (ID = ATCG00300)
    WARNING: on line 27226, chromosome code 'C' not recognised; skipping gene (ID = ATCG00330)
    WARNING: on line 27227, chromosome code 'C' not recognised; skipping gene (ID = ATCG00340)
    WARNING: on line 27228, chromosome code 'C' not recognised; skipping gene (ID = ATCG00350)
    WARNING: on line 27229, chromosome code 'C' not recognised; skipping gene (ID = ATCG00360)
    WARNING: on line 27230, chromosome code 'C' not recognised; skipping gene (ID = ATCG00380)
    WARNING: on line 27231, chromosome code 'C' not recognised; skipping gene (ID = ATCG00420)
    WARNING: on line 27232, chromosome code 'C' not recognised; skipping gene (ID = ATCG00430)
    WARNING: on line 27233, chromosome code 'C' not recognised; skipping gene (ID = ATCG00440)
    WARNING: on line 27234, chromosome code 'C' not recognised; skipping gene (ID = ATCG00470)
    WARNING: on line 27235, chromosome code 'C' not recognised; skipping gene (ID = ATCG00480)
    WARNING: on line 27236, chromosome code 'C' not recognised; skipping gene (ID = ATCG00490)
    WARNING: on line 27237, chromosome code 'C' not recognised; skipping gene (ID = ATCG00500)
    WARNING: on line 27238, chromosome code 'C' not recognised; skipping gene (ID = ATCG00510)
    WARNING: on line 27239, chromosome code 'C' not recognised; skipping gene (ID = ATCG00520)
    WARNING: on line 27240, chromosome code 'C' not recognised; skipping gene (ID = ATCG00530)
    WARNING: on line 27241, chromosome code 'C' not recognised; skipping gene (ID = ATCG00540)
    WARNING: on line 27242, chromosome code 'C' not recognised; skipping gene (ID = ATCG00550)
    WARNING: on line 27243, chromosome code 'C' not recognised; skipping gene (ID = ATCG00560)
    WARNING: on line 27244, chromosome code 'C' not recognised; skipping gene (ID = ATCG00570)
    WARNING: on line 27245, chromosome code 'C' not recognised; skipping gene (ID = ATCG00580)
    WARNING: on line 27246, chromosome code 'C' not recognised; skipping gene (ID = ATCG00590)
    WARNING: on line 27247, chromosome code 'C' not recognised; skipping gene (ID = ATCG00600)
    WARNING: on line 27248, chromosome code 'C' not recognised; skipping gene (ID = ATCG00630)
    WARNING: on line 27249, chromosome code 'C' not recognised; skipping gene (ID = ATCG00640)
    WARNING: on line 27250, chromosome code 'C' not recognised; skipping gene (ID = ATCG00650)
    WARNING: on line 27251, chromosome code 'C' not recognised; skipping gene (ID = ATCG00660)
    WARNING: on line 27252, chromosome code 'C' not recognised; skipping gene (ID = ATCG00670)
    WARNING: on line 27253, chromosome code 'C' not recognised; skipping gene (ID = ATCG00680)
    WARNING: on line 27254, chromosome code 'C' not recognised; skipping gene (ID = ATCG00690)
    WARNING: on line 27255, chromosome code 'C' not recognised; skipping gene (ID = ATCG00700)
    WARNING: on line 27256, chromosome code 'C' not recognised; skipping gene (ID = ATCG00710)
    WARNING: on line 27257, chromosome code 'C' not recognised; skipping gene (ID = ATCG00720)
    WARNING: on line 27258, chromosome code 'C' not recognised; skipping gene (ID = ATCG00730)
    WARNING: on line 27259, chromosome code 'C' not recognised; skipping gene (ID = ATCG00740)
    WARNING: on line 27260, chromosome code 'C' not recognised; skipping gene (ID = ATCG00750)
    WARNING: on line 27261, chromosome code 'C' not recognised; skipping gene (ID = ATCG00760)
    WARNING: on line 27262, chromosome code 'C' not recognised; skipping gene (ID = ATCG00770)
    WARNING: on line 27263, chromosome code 'C' not recognised; skipping gene (ID = ATCG00780)
    WARNING: on line 27264, chromosome code 'C' not recognised; skipping gene (ID = ATCG00790)
    WARNING: on line 27265, chromosome code 'C' not recognised; skipping gene (ID = ATCG00800)
    WARNING: on line 27266, chromosome code 'C' not recognised; skipping gene (ID = ATCG00810)
    WARNING: on line 27267, chromosome code 'C' not recognised; skipping gene (ID = ATCG00820)
    WARNING: on line 27268, chromosome code 'C' not recognised; skipping gene (ID = ATCG00830)
    WARNING: on line 27269, chromosome code 'C' not recognised; skipping gene (ID = ATCG00840)
    WARNING: on line 27270, chromosome code 'C' not recognised; skipping gene (ID = ATCG00860)
    WARNING: on line 27271, chromosome code 'C' not recognised; skipping gene (ID = ATCG00870)
    WARNING: on line 27272, chromosome code 'C' not recognised; skipping gene (ID = ATCG00890)
    WARNING: on line 27273, chromosome code 'C' not recognised; skipping gene (ID = ATCG00900)
    WARNING: on line 27274, chromosome code 'C' not recognised; skipping gene (ID = ATCG00905)
    WARNING: on line 27275, chromosome code 'C' not recognised; skipping gene (ID = ATCG01000)
    WARNING: on line 27276, chromosome code 'C' not recognised; skipping gene (ID = ATCG01010)
    WARNING: on line 27277, chromosome code 'C' not recognised; skipping gene (ID = ATCG01020)
    WARNING: on line 27278, chromosome code 'C' not recognised; skipping gene (ID = ATCG01040)
    WARNING: on line 27279, chromosome code 'C' not recognised; skipping gene (ID = ATCG01050)
    WARNING: on line 27280, chromosome code 'C' not recognised; skipping gene (ID = ATCG01060)
    WARNING: on line 27281, chromosome code 'C' not recognised; skipping gene (ID = ATCG01070)
    WARNING: on line 27282, chromosome code 'C' not recognised; skipping gene (ID = ATCG01080)
    WARNING: on line 27283, chromosome code 'C' not recognised; skipping gene (ID = ATCG01090)
    WARNING: on line 27284, chromosome code 'C' not recognised; skipping gene (ID = ATCG01100)
    WARNING: on line 27285, chromosome code 'C' not recognised; skipping gene (ID = ATCG01110)
    WARNING: on line 27286, chromosome code 'C' not recognised; skipping gene (ID = ATCG01120)
    WARNING: on line 27287, chromosome code 'C' not recognised; skipping gene (ID = ATCG01130)
    WARNING: on line 27288, chromosome code 'C' not recognised; skipping gene (ID = ATCG01230)
    WARNING: on line 27289, chromosome code 'C' not recognised; skipping gene (ID = ATCG01240)
    WARNING: on line 27290, chromosome code 'C' not recognised; skipping gene (ID = ATCG01250)
    WARNING: on line 27291, chromosome code 'C' not recognised; skipping gene (ID = ATCG01270)
    WARNING: on line 27292, chromosome code 'C' not recognised; skipping gene (ID = ATCG01280)
    WARNING: on line 27293, chromosome code 'C' not recognised; skipping gene (ID = ATCG01300)
    WARNING: on line 27294, chromosome code 'C' not recognised; skipping gene (ID = ATCG01310)
    WARNING: on line 27295, chromosome code 'M' not recognised; skipping gene (ID = ATMG00010)
    WARNING: on line 27296, chromosome code 'M' not recognised; skipping gene (ID = ATMG00030)
    WARNING: on line 27297, chromosome code 'M' not recognised; skipping gene (ID = ATMG00040)
    WARNING: on line 27298, chromosome code 'M' not recognised; skipping gene (ID = ATMG00050)
    WARNING: on line 27299, chromosome code 'M' not recognised; skipping gene (ID = ATMG00060)
    WARNING: on line 27300, chromosome code 'M' not recognised; skipping gene (ID = ATMG00070)
    WARNING: on line 27301, chromosome code 'M' not recognised; skipping gene (ID = ATMG00080)
    WARNING: on line 27302, chromosome code 'M' not recognised; skipping gene (ID = ATMG00090)
    WARNING: on line 27303, chromosome code 'M' not recognised; skipping gene (ID = ATMG00110)
    WARNING: on line 27304, chromosome code 'M' not recognised; skipping gene (ID = ATMG00120)
    WARNING: on line 27305, chromosome code 'M' not recognised; skipping gene (ID = ATMG00130)
    WARNING: on line 27306, chromosome code 'M' not recognised; skipping gene (ID = ATMG00140)
    WARNING: on line 27307, chromosome code 'M' not recognised; skipping gene (ID = ATMG00150)
    WARNING: on line 27308, chromosome code 'M' not recognised; skipping gene (ID = ATMG00160)
    WARNING: on line 27309, chromosome code 'M' not recognised; skipping gene (ID = ATMG00170)
    WARNING: on line 27310, chromosome code 'M' not recognised; skipping gene (ID = ATMG00180)
    WARNING: on line 27311, chromosome code 'M' not recognised; skipping gene (ID = ATMG00200)
    WARNING: on line 27312, chromosome code 'M' not recognised; skipping gene (ID = ATMG00210)
    WARNING: on line 27313, chromosome code 'M' not recognised; skipping gene (ID = ATMG00220)
    WARNING: on line 27314, chromosome code 'M' not recognised; skipping gene (ID = ATMG00240)
    WARNING: on line 27315, chromosome code 'M' not recognised; skipping gene (ID = ATMG00260)
    WARNING: on line 27316, chromosome code 'M' not recognised; skipping gene (ID = ATMG00270)
    WARNING: on line 27317, chromosome code 'M' not recognised; skipping gene (ID = ATMG00280)
    WARNING: on line 27318, chromosome code 'M' not recognised; skipping gene (ID = ATMG00285)
    WARNING: on line 27319, chromosome code 'M' not recognised; skipping gene (ID = ATMG00290)
    WARNING: on line 27320, chromosome code 'M' not recognised; skipping gene (ID = ATMG00300)
    WARNING: on line 27321, chromosome code 'M' not recognised; skipping gene (ID = ATMG00310)
    WARNING: on line 27322, chromosome code 'M' not recognised; skipping gene (ID = ATMG00320)
    WARNING: on line 27323, chromosome code 'M' not recognised; skipping gene (ID = ATMG00370)
    WARNING: on line 27324, chromosome code 'M' not recognised; skipping gene (ID = ATMG00400)
    WARNING: on line 27325, chromosome code 'M' not recognised; skipping gene (ID = ATMG00410)
    WARNING: on line 27326, chromosome code 'M' not recognised; skipping gene (ID = ATMG00430)
    WARNING: on line 27327, chromosome code 'M' not recognised; skipping gene (ID = ATMG00440)
    WARNING: on line 27328, chromosome code 'M' not recognised; skipping gene (ID = ATMG00450)
    WARNING: on line 27329, chromosome code 'M' not recognised; skipping gene (ID = ATMG00470)
    WARNING: on line 27330, chromosome code 'M' not recognised; skipping gene (ID = ATMG00480)
    WARNING: on line 27331, chromosome code 'M' not recognised; skipping gene (ID = ATMG00490)
    WARNING: on line 27332, chromosome code 'M' not recognised; skipping gene (ID = ATMG00500)
    WARNING: on line 27333, chromosome code 'M' not recognised; skipping gene (ID = ATMG00510)
    WARNING: on line 27334, chromosome code 'M' not recognised; skipping gene (ID = ATMG00513)
    WARNING: on line 27335, chromosome code 'M' not recognised; skipping gene (ID = ATMG00516)
    WARNING: on line 27336, chromosome code 'M' not recognised; skipping gene (ID = ATMG00520)
    WARNING: on line 27337, chromosome code 'M' not recognised; skipping gene (ID = ATMG00530)
    WARNING: on line 27338, chromosome code 'M' not recognised; skipping gene (ID = ATMG00540)
    WARNING: on line 27339, chromosome code 'M' not recognised; skipping gene (ID = ATMG00550)
    WARNING: on line 27340, chromosome code 'M' not recognised; skipping gene (ID = ATMG00560)
    WARNING: on line 27341, chromosome code 'M' not recognised; skipping gene (ID = ATMG00570)
    WARNING: on line 27342, chromosome code 'M' not recognised; skipping gene (ID = ATMG00580)
    WARNING: on line 27343, chromosome code 'M' not recognised; skipping gene (ID = ATMG00590)
    WARNING: on line 27344, chromosome code 'M' not recognised; skipping gene (ID = ATMG00600)
    WARNING: on line 27345, chromosome code 'M' not recognised; skipping gene (ID = ATMG00610)
    WARNING: on line 27346, chromosome code 'M' not recognised; skipping gene (ID = ATMG00620)
    WARNING: on line 27347, chromosome code 'M' not recognised; skipping gene (ID = ATMG00630)
    WARNING: on line 27348, chromosome code 'M' not recognised; skipping gene (ID = ATMG00640)
    WARNING: on line 27349, chromosome code 'M' not recognised; skipping gene (ID = ATMG00650)
    WARNING: on line 27350, chromosome code 'M' not recognised; skipping gene (ID = ATMG00660)
    WARNING: on line 27351, chromosome code 'M' not recognised; skipping gene (ID = ATMG00665)
    WARNING: on line 27352, chromosome code 'M' not recognised; skipping gene (ID = ATMG00670)
    WARNING: on line 27353, chromosome code 'M' not recognised; skipping gene (ID = ATMG00680)
    WARNING: on line 27354, chromosome code 'M' not recognised; skipping gene (ID = ATMG00690)
    WARNING: on line 27355, chromosome code 'M' not recognised; skipping gene (ID = ATMG00710)
    WARNING: on line 27356, chromosome code 'M' not recognised; skipping gene (ID = ATMG00720)
    WARNING: on line 27357, chromosome code 'M' not recognised; skipping gene (ID = ATMG00730)
    WARNING: on line 27358, chromosome code 'M' not recognised; skipping gene (ID = ATMG00740)
    WARNING: on line 27359, chromosome code 'M' not recognised; skipping gene (ID = ATMG00750)
    WARNING: on line 27360, chromosome code 'M' not recognised; skipping gene (ID = ATMG00760)
    WARNING: on line 27361, chromosome code 'M' not recognised; skipping gene (ID = ATMG00770)
    WARNING: on line 27362, chromosome code 'M' not recognised; skipping gene (ID = ATMG00810)
    WARNING: on line 27363, chromosome code 'M' not recognised; skipping gene (ID = ATMG00820)
    WARNING: on line 27364, chromosome code 'M' not recognised; skipping gene (ID = ATMG00830)
    WARNING: on line 27365, chromosome code 'M' not recognised; skipping gene (ID = ATMG00840)
    WARNING: on line 27366, chromosome code 'M' not recognised; skipping gene (ID = ATMG00850)
    WARNING: on line 27367, chromosome code 'M' not recognised; skipping gene (ID = ATMG00860)
    WARNING: on line 27368, chromosome code 'M' not recognised; skipping gene (ID = ATMG00870)
    WARNING: on line 27369, chromosome code 'M' not recognised; skipping gene (ID = ATMG00880)
    WARNING: on line 27370, chromosome code 'M' not recognised; skipping gene (ID = ATMG00890)
    WARNING: on line 27371, chromosome code 'M' not recognised; skipping gene (ID = ATMG00900)
    WARNING: on line 27372, chromosome code 'M' not recognised; skipping gene (ID = ATMG00910)
    WARNING: on line 27373, chromosome code 'M' not recognised; skipping gene (ID = ATMG00920)
    WARNING: on line 27374, chromosome code 'M' not recognised; skipping gene (ID = ATMG00940)
    WARNING: on line 27375, chromosome code 'M' not recognised; skipping gene (ID = ATMG00960)
    WARNING: on line 27376, chromosome code 'M' not recognised; skipping gene (ID = ATMG00970)
    WARNING: on line 27377, chromosome code 'M' not recognised; skipping gene (ID = ATMG00980)
    WARNING: on line 27378, chromosome code 'M' not recognised; skipping gene (ID = ATMG00990)
    WARNING: on line 27379, chromosome code 'M' not recognised; skipping gene (ID = ATMG01000)
    WARNING: on line 27380, chromosome code 'M' not recognised; skipping gene (ID = ATMG01010)
    WARNING: on line 27381, chromosome code 'M' not recognised; skipping gene (ID = ATMG01020)
    WARNING: on line 27382, chromosome code 'M' not recognised; skipping gene (ID = ATMG01030)
    WARNING: on line 27383, chromosome code 'M' not recognised; skipping gene (ID = ATMG01040)
    WARNING: on line 27384, chromosome code 'M' not recognised; skipping gene (ID = ATMG01050)
    WARNING: on line 27385, chromosome code 'M' not recognised; skipping gene (ID = ATMG01060)
    WARNING: on line 27386, chromosome code 'M' not recognised; skipping gene (ID = ATMG01080)
    WARNING: on line 27387, chromosome code 'M' not recognised; skipping gene (ID = ATMG01090)
    WARNING: on line 27388, chromosome code 'M' not recognised; skipping gene (ID = ATMG01100)
    WARNING: on line 27389, chromosome code 'M' not recognised; skipping gene (ID = ATMG01110)
    WARNING: on line 27390, chromosome code 'M' not recognised; skipping gene (ID = ATMG01120)
    WARNING: on line 27391, chromosome code 'M' not recognised; skipping gene (ID = ATMG01130)
    WARNING: on line 27392, chromosome code 'M' not recognised; skipping gene (ID = ATMG01140)
    WARNING: on line 27393, chromosome code 'M' not recognised; skipping gene (ID = ATMG01150)
    WARNING: on line 27394, chromosome code 'M' not recognised; skipping gene (ID = ATMG01170)
    WARNING: on line 27395, chromosome code 'M' not recognised; skipping gene (ID = ATMG01180)
    WARNING: on line 27396, chromosome code 'M' not recognised; skipping gene (ID = ATMG01190)
    WARNING: on line 27397, chromosome code 'M' not recognised; skipping gene (ID = ATMG01200)
    WARNING: on line 27398, chromosome code 'M' not recognised; skipping gene (ID = ATMG01210)
    WARNING: on line 27399, chromosome code 'M' not recognised; skipping gene (ID = ATMG01220)
    WARNING: on line 27400, chromosome code 'M' not recognised; skipping gene (ID = ATMG01230)
    WARNING: on line 27401, chromosome code 'M' not recognised; skipping gene (ID = ATMG01240)
    WARNING: on line 27402, chromosome code 'M' not recognised; skipping gene (ID = ATMG01250)
    WARNING: on line 27403, chromosome code 'M' not recognised; skipping gene (ID = ATMG01260)
    WARNING: on line 27404, chromosome code 'M' not recognised; skipping gene (ID = ATMG01270)
    WARNING: on line 27405, chromosome code 'M' not recognised; skipping gene (ID = ATMG01275)
    WARNING: on line 27406, chromosome code 'M' not recognised; skipping gene (ID = ATMG01280)
    WARNING: on line 27407, chromosome code 'M' not recognised; skipping gene (ID = ATMG01290)
    WARNING: on line 27408, chromosome code 'M' not recognised; skipping gene (ID = ATMG01300)
    WARNING: on line 27409, chromosome code 'M' not recognised; skipping gene (ID = ATMG01310)
    WARNING: on line 27410, chromosome code 'M' not recognised; skipping gene (ID = ATMG01320)
    WARNING: on line 27411, chromosome code 'M' not recognised; skipping gene (ID = ATMG01330)
    WARNING: on line 27412, chromosome code 'M' not recognised; skipping gene (ID = ATMG01350)
    WARNING: on line 27413, chromosome code 'M' not recognised; skipping gene (ID = ATMG01360)
    WARNING: on line 27414, chromosome code 'M' not recognised; skipping gene (ID = ATMG01370)
    WARNING: on line 27415, chromosome code 'M' not recognised; skipping gene (ID = ATMG01400)
    WARNING: on line 27416, chromosome code 'M' not recognised; skipping gene (ID = ATMG01410)
    27206 gene locations read from file
    chromosome  1: 7078 genes
    chromosome  2: 4245 genes
    chromosome  3: 5437 genes
    chromosome  4: 4128 genes
    chromosome  5: 6318 genes
Reading SNP locations from file ./study727.bim... 
    4932457 SNP locations read from file
    of those, 4184071 (84.83%) mapped to at least one gene
Writing annotation to file /public/home/WT_liuzm/Data/PlantSingleCell/GWAS/work/study727/727.genes.annot
    for chromosome  1, 24 genes are empty (out of 7078)
    for chromosome  2, 10 genes are empty (out of 4245)
    for chromosome  4, 7 genes are empty (out of 4128)
    for chromosome  5, 33 genes are empty (out of 6318)
    at least one SNP mapped to each of a total of 27132 genes (out of 27206)

End time is 15:37:30, Wednesday 12 Apr 2023 (elapsed: 00:00:31)

I used the parameters : annotate window=5,5 So should I change the parameters to the default setting to get fewer genes ?

Thanks !!

KangchengHou commented 1 year ago

annotate window=5,5 corresponds to 5kb around gene body. The choice of window did not affect results much in our experiments. But you may want to experiment with different choice of this parameter 0kb, 5kb, 10kb.

We recommend to use all SNPs and all genes as you did now. Next you can proceed to scDRS pipeline which will handle the selection of top 1,000 genes.

bitcometz commented 1 year ago

@KangchengHou , thanks for your reply !!!