opain / TWAS-GSEA

R script that performs gene-set or gene property analysis using TWAS results.
GNU General Public License v3.0
30 stars 6 forks source link

Error in Linear_Results[order(Linear_Results$P), ] : #4

Closed MIAO0916 closed 6 months ago

MIAO0916 commented 1 year ago

Hi @opain ,i could not find your gmt_file, so another .gmt is used instead, and the rest follows your example file, but it did not work. the code and the error: Rscript TWAS-GSEA.V1.2.R\ --twas_results ukbiobank-2017-1160-prePRS-fusion.tsv.GW \ --pos CMC.BRAIN.RNASEQ.pos \ --gmt_file c5.all.v2022.1.Hs.symbols.gmt \ --expression_ref CMC.BRAIN.RNASEQ_GeneX_all_MINI.csv \ --output demo

errors: Error in Linear_Results[order(Linear_Results$P), ] : incorrect number of dimensions Execution halted

look forward for your help. Thanks.

MIAO0916 commented 1 year ago

I found that using c2.all.v2023.1.Hs.entrez.gmt solved the above problem and allowed the test code to run smoothly. But I took your advice from last time and added ‘--allow_duplicate_ID F’ and it still reports an error when running my own file: Rscript TWAS-GSEA.V1.2.R\ --twas_results anno.cut_car_Boer_meta_82w_Cell2021.OA.hip.table.all \ --pos OA_cartilage_v1.WH_fusion.P01.pos \ --gmt_file c2.all.v2023.1.Hs.entrez.gmt \ --expression_ref OA_cartilage_v1.WH_fusion_GeneX_all_MINI.csv \ --allow_duplicate_ID F \ --output demo error: $ >> sh rscript.sh Error in order(...) : argument 1 is not a vector Calls: [ ... order -> standardGeneric -> eval -> eval -> eval -> order Execution halted

anno.cut_car_Boer_meta_82w_Cell2021.OA.hip.table.all ,the format is as follows(Only certain rows) FILE ID CHR P0 P1 TWAS.Z TWAS.P /home/songhuimiao/TWAS-GSEA/TWAS-GSEA/my_gsea/TWAS-GSEA/OA_cartilage_v1.WH_fusion/ENSG00000156671.8.fusion_model.wgt.RDat SAMD8 10 76859344 76941881 0.7681 0.44242 /home/songhuimiao/TWAS-GSEA/TWAS-GSEA/my_gsea/TWAS-GSEA/OA_cartilage_v1.WH_fusion/ENSG00000150076.18.fusion_model.wgt.RDat C10orf68 10 32832297 33171802 1.6399 0.10102 /home/songhuimiao/TWAS-GSEA/TWAS-GSEA/my_gsea/TWAS-GSEA/OA_cartilage_v1.WH_fusion/ENSG00000155252.12.fusion_model.wgt.RDat PI4K2A 10 99344131 99433667 -0.9044 0.36580

OA_cartilage_v1.WH_fusion.P01.pos,the format is as follows(Only certain rows) WGT ID CHR P0 P1 OA_cartilage_v1.WH_fusion/ENSG00000001561.6.fusion_model.wgt.RDat ENPP4 6 46097730 46114436 OA_cartilage_v1.WH_fusion/ENSG00000002016.12.fusion_model.wgt.RDat RAD52 12 1021243 1099219 OA_cartilage_v1.WH_fusion/ENSG00000002822.11.fusion_model.wgt.RDat MAD1L1 7 1855429 2272878 OA_cartilage_v1.WH_fusion/ENSG00000002919.10.fusion_model.wgt.RDat SNX11 17 46180719 46200436

OA_cartilage_v1.WH_fusion_GeneX_all_MINI.csv,the format is as follows(Only certain rows and columns were intercepted) FID IID ENSG00000040731.6 ENSG00000051596.5 ENSG00000061455.10 HG00096 HG00096 0.28 -0.135 0.627 HG00097 HG00097 0.28 -0.135 -1.301 HG00099 HG00099 -1.098 -0.791 -3.229 HG00101 HG00101 -1.098 2.028 0.627 HG00102 HG00102 0.28 -0.135 -1.301

look forward for your help. Thanks.

MIAO0916 commented 1 year ago

Is the problem that the csv file needs to be comma-separated, or can't you use ENSG numbering and use gene symbols? I'll try to debug this in the next few days, the GW file is missing those columns, could that be an influencing factor?

opain commented 1 year ago

Yes, the script by default assumes the gene IDs in the TWAS and .pos file are gene symbols (external_gene_name in biomart), which are then converted into entrez IDs to match the .gmt file. So you will need to change your gene IDs in the TWAS/.pos into gene symbols.

Alternatively, if you specify "--use_alt_id ID", it will use the ENSG IDs, but then you will also need to use ENSG IDs in the .gmt file.

I have made a new push with improved the documentation just now to make it clear that gene IDs are in assumed format unless using the --use_alt_id flag.