opain / TWAS-GSEA

R script that performs gene-set or gene property analysis using TWAS results.
GNU General Public License v3.0
31 stars 6 forks source link

Error in order(...) : argument 1 is not a vector #7

Closed MIAO0916 closed 8 months ago

MIAO0916 commented 11 months ago

Hi @opain , Thank you for your previous reply. I've taken your advice and added “--use_alt_id” and also standardised all the gene names, but unfortunately, I'm still getting this error. Can you give me some advice? the code: Rscript TWAS-GSEA.V1.2.R\ --twas_results anno.cut_syn_Boer_meta_82w_Cell2021.OA.hip.table.all \ --pos toensg_OA_synovium_v1.WH_fusion.P01.pos \ --gmt_file toensg_c5.all.v2022.1.Hs.symbols.gmt \ --expression_ref OA_synovium_v1.WH_fusion_GeneX_all_MINI.csv \ --allow_duplicate_ID F \ --use_alt_id ID \ --output demo error: Error in order(...) : argument 1 is not a vector Calls: [ ... order -> standardGeneric -> eval -> eval -> eval -> order Execution halted

anno.cut_syn_Boer_meta_82w_Cell2021.OA.hip.table.all,the format is as follows(Only certain rows) FILE P0 P1 TWAS.Z TWAS.P /home/songhuimiao/TWAS-GSEA/OA_synovium_v1.WH_fusion/ENSG00000187134.8.fusion_model.wgt.RDat 4934796 5025475 -1.53795 0.124060 /home/songhuimiao/TWAS-GSEA/OA_synovium_v1.WH_fusion/ENSG00000165678.15.fusion_model.wgt.RDat 85899196 85913001 -0.31400 0.753521 /home/songhuimiao/TWAS-GSEA/OA_synovium_v1.WH_fusion/ENSG00000203772.6.fusion_model.wgt.RDat 135234170 135382916 2.01858 0.043531 /home/songhuimiao/TWAS-GSEA/OA_synovium_v1.WH_fusion/ENSG00000177853.10.fusion_model.wgt.RDat 97889472 97965044 1.01057 0.312222

toensg_OA_synovium_v1.WH_fusion.P01.pos,the format is as follows(Only certain rows) WGT ID CHR P0 P1 OA_synovium_v1.WH_fusion/ENSG00000001036.9.fusion_model.wgt.RDat ENSG00000001036.9 6 143815948 143832827 OA_synovium_v1.WH_fusion/ENSG00000001561.6.fusion_model.wgt.RDat ENSG00000001561.6 6 46097730 46114436 OA_synovium_v1.WH_fusion/ENSG00000002016.12.fusion_model.wgt.RDat ENSG00000002016.12 12 1021243 1099219 OA_synovium_v1.WH_fusion/ENSG00000002726.15.fusion_model.wgt.RDat ENSG00000002726.15 7 150521715 15055859

toensg_c5.all.v2022.1.Hs.symbols.gmt,the format is as follows(Only certain rows and columns) GOBP_10_FORMYLTETRAHYDROFOLATE_METABOLIC_PROCESS https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOBP_10_FORMYLTETRAHYDROFOLATE_METABOLIC_PROCESS ENSG00000117020.12 ENSG00000109819.4 ENSG00000256525.2 GOBP_2FE_2S_CLUSTER_ASSEMBLY https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOBP_2FE_2S_CLUSTER_ASSEMBLY ENSG00000196839.8 ENSG00000113552.11 ENSG00000220201.3 GOBP_2_OXOGLUTARATE_METABOLIC_PROCESS https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOBP_2_OXOGLUTARATE_METABOLIC_PROCESS ENSG00000049167.9 ENSG00000143799.8

OA_synovium_v1.WH_fusion_GeneX_all_MINI.csv, the format is as follows(Only certain rows and columns) FID IID ENSG00000037474.10 ENSG00000039537.9 ENSG00000072786.8 HG00096 HG00096 -0.502 0.471 -0.935 HG00097 HG00097 -0.502 0.471 -0.935 HG00099 HG00099 -0.502 0.471 -0.935 HG00101 HG00101 -0.502 0.471 0.636 HG00102 HG00102 0.967 0.471 -0.935

the log file stopped here: Analysis started at 2023-12-08 02:59:32.472775 TWAS results file contains 2402 rows. Positional information is available for 2402 TWAS features. TWAS contains 2389 unique features with non-missing TWAS.P values.

I really don't know what the problem is. look forward for your help. Thanks.

opain commented 11 months ago

Hi,

There appears to be no 'ID' column in your TWAS results file (anno.cut_syn_Boer_meta_82w_Cell2021.OA.hip.table.all), which is required by default, and when using --use_alt_id ID.

Apologies that error message is uninformative. I will push and update.

opain commented 11 months ago

I have just pushed a commit to give error messages when required twas columns are not present. After some testing, deleting the ID column did not replicate your error message though.

I have realised another column is required (apologies) - MODELCV.R2 is used to select the feature with the best R2 when the --allow_duplicate_ID F is specified. If your gene IDs are unique, you can omit this option. Or, you will need to insert the MODELCV.R2 column. I have just pushed another update giving a more informative error and to update the README.

MIAO0916 commented 11 months ago

Thank you very much for your willingness to debug this difficulty with me!

My gene ID is indeed unique, so I've changed the code and files as you suggested. Now the code is: Rscript TWAS-GSEA.V1.2.R\ --twas_results addid_anno.cut_syn_Boer_meta_82w_Cell2021.OA.hip.table.all \ --pos toensg_OA_synovium_v1.WH_fusion.P01.pos \ --gmt_file toensg_c5.all.v2022.1.Hs.symbols.gmt \ --expression_ref OA_synovium_v1.WH_fusion_GeneX_all_MINI.csv \ --use_alt_id ID \ --output demo addid_anno.cut_syn_Boer_meta_82w_Cell2021.OA.hip.table.all the format is as follows(Only certain rows: FILE ID P0 P1 TWAS.Z TWAS.P /home/TWAS-GSEA/OA_synovium_v1.WH_fusion/ENSG00000187134.8.fusion_model.wgt.RDat ENSG00000187134.8 4934796 5025475 -1.53795 0.124060 /home/TWAS-GSEA/OA_synovium_v1.WH_fusion/ENSG00000165678.15.fusion_model.wgt.RDat ENSG00000165678.15 85899196 85913001 -0.31400 0.753521 /home/TWAS-GSEA/OA_synovium_v1.WH_fusion/ENSG00000203772.6.fusion_model.wgt.RDat ENSG00000203772.6 135234170 135382916 2.01858 0.043531

But the error still occurs, and the log file stops at the same place. Error in order(...) : argument 1 is not a vector Calls: [ ... order -> standardGeneric -> eval -> eval -> eval -> order Execution halted

I will wait patiently for your updates, cheers! Mia

opain commented 11 months ago

The default of --allow_duplicate_ID is F, so you will need to specify '--allow_duplicate_ID T'.

MIAO0916 commented 11 months ago

Hi @opain, Immediately report to you the good news that I managed to run through the whole process with my own data, thank you very much for your help during this period! I got this reply: null device 1 But there are only three kinds of result files:demo.linear.txt,demo.linear.png and demo.log.

Then, I use the following code to run the sample file(I didn't find the file GO_STRICT_PC_10-2000_MAGMA.gmt, it's a different file instead), also only get the results of these three output files. Rscript TWAS-GSEA.V1.2.R\ --twas_results ukbiobank-2017-1160-prePRS-fusion.tsv.GW \ --pos CMC.BRAIN.RNASEQ.pos \ --gmt_file c2.all.v2023.1.Hs.entrez.gmt \ --expression_ref CMC.BRAIN.RNASEQ_GeneX_all_MINI.csv \ --output demo I would like to ask if this is normal, as well as if there are any articles or other references that provide in-depth guidance on interpreting the results?

Mia

opain commented 11 months ago

Great that it has run through.

There isn't any more documentation I am afraid. Here is a brief explanation. The script uses a linear mixed model to test for enrichment of gene annotations whilst accounting for the non-independence of TWAS features (i.e. the predicted expression of nearby genes is correlated). However, the linear mixed model is computationally intensive, so the script initially runs simple linear regression, and then carries forward gene annotations for analysis using the linear mixed model if they are statistically significant in when using the linear model. This avoids running linear mixed model analysis for sets that show no evidence of enrichment.

The linear model results are saved in the file ending '.linear.txt'. The linear mixed model results are saved in the file ending '.competitive.txt' - If this file isn't created, it is because no gene annotations achieved significance according to the --linear_p_thresh parameter (Default behavior is to use a multiple testing corrected p-value threshold of 0.1). If you are not using many TWAS features, and want linear mixed model results for all annotations, you can change specify '--linear_p_thresh 1'.

I hope this helps!

MIAO0916 commented 11 months ago

This helps me a lot, thank you so much!