Closed NaomiHuntley closed 1 year ago
Hi,
The format of the .gs
file is expected to be gene1:weight1,gene2:weight2,...
, where gene1
, gene2
, etc. are gene names. However, in your file, gene1
appears to be represented by the number 130576
, which is not a gene name (like Cd4
). As a result, none of the genes in your .gs
file appear to be present in your .h5ad
file, which means that scDRS will not be able to process the data correctly. To ensure that scDRS can work with your data, please check that the genes listed in your .gs
file match the gene names present in your .h5ad file.
Thank you for the quick reply. It seems that the numbers come from the magma gene analysis step. This is my first time using magma, so do you happen to have any insight as to why this would happen?
Thank you!
@KangchengHou could you help with the MAGMA question? Thanks
@NaomiHuntley in the MAGMA directory, there is a file <MAGMA_DIR>/NCBI37.3.gene.loc
which contains the correspondence between gene number and gene symbol. Will add this information to https://github.com/martinjzhang/scDRS/blob/master/docs/compute_magma_gs.md
Please let me know any questions
Hi @KangchengHou - as I am new to this, I am not entirely sure how to map the gene numbers to the symbols. I looked through the documentation for magma, but it seems that is not something I can do in magma. Is there a different tool? Thanks in advance for the clarification!
HI @NaomiHuntley, NCBI37.3.gene.loc is a .tsv file whose first column is the numbers and last column is the gene names. You need to write a small script (e.g., in R or Python) to do the mapping, changing the numbers to the corresponding gene names.
@NaomiHuntley Alternatively you can try with the following to modify NCBI37.3.gene.loc
that was used to run MAGMA
# switch the 1st column and the 6th column
awk '{OFS="\t"; print $6,$2,$3,$4,$5,$1}' NCBI37.3.gene.loc > NCBI37.3_symbol.gene.loc
And for MAGMA step1, use the following (note the replaced gene-loc file)
${magma_dir}/magma \
--annotate window=10,10 \
--snp-loc ${magma_dir}/g1000_eur.bim \
--gene-loc ${magma_dir}/NCBI37.3_symbol.gene.loc \
--out out/step1
This should be more convenient. Please let us know how this works
@KangchengHou Thank you so much for that explanation. I am new to all of this so that helped a lot. I just submitted the gene annotation step, which took a few days last time. I will post an update when it works or if there are any more problems.
@KangchengHou @martinjzhang Thank you for the help! This code worked well for me and corrected my issue.
Hello. I am working on computing scDRS scores for several different traits, however I keep getting the error that the trait is being skipped due to small size for all the different traits that I try:![Screenshot 2023-03-20 at 10 42 30 AM](https://user-images.githubusercontent.com/54378109/226374811-fb67b5fc-5b16-4cfd-acb7-4e0d1368158d.png)
Here is the head of the file I am testing the code on:![Screenshot 2023-03-20 at 10 43 14 AM](https://user-images.githubusercontent.com/54378109/226375009-9dc1f47b-8ec5-4f11-b334-3232a03e37dc.png)
Thank you!