Hi!
I am having trouble when I do the gwas imputation.
Firstly, I generate the parquet by myself. ( I'm currently working on cattle). Using the command below:
I got the error report:
File "/bin/summary-gwas-imputation/src/gwas_summary_imputation.py", line 97, in
run(args)
File "/bin/summary-gwas-imputation/src/gwas_summary_imputation.py", line 62, in run
results = run_by_variant(args)
File "/bin/summary-gwas-imputation/src/gwas_summary_imputation.py", line 24, in run_by_variant
context = SummaryImputationUtilities.context_from_args(args)
File "/bin/summary-gwas-imputation/src/genomic_tools_lib/summary_imputation/Utilities.py", line 174, in context_from_args
study = load_study(args)
File "/bin/summary-gwas-imputation/src/genomic_tools_lib/summary_imputation/Utilities.py", line 162, in load_study
study = Parquet.study_from_parquet(args.parquet_genotype, args.parquet_genotype_metadata, chromosome=args.chromosome)
File "/bin/summary-gwas-imputation/src/genomic_tools_lib/file_formats/Parquet.py", line 218, in study_from_parquet
_v = pq.ParquetFile(variants)
File "/home/anaconda3/envs/imlabtools/lib/python3.7/site-packages/pyarrow/parquet.py", line 137, in init
read_dictionary=read_dictionary, metadata=metadata)
File "pyarrow/_parquet.pyx", line 1048, in pyarrow._parquet.ParquetReader.open
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
OSError: Invalid flatbuffers message.
Could you help to let me know where's wrong with my script?
Hi! I am having trouble when I do the gwas imputation. Firstly, I generate the parquet by myself. ( I'm currently working on cattle). Using the command below:
python $REPO/model_training_genotype_to_parquet.py \ -input_genotype_file $DATA/parquet_run7/Chr${chr}-Run7-TAU-Beagle-toDistribute.txt.gz \ -snp_annotation_file $DATA/parquet_run7/Chr${chr}_maf0.01_monoallelic_variants.txt.gz METADATA \ -parsimony 9 \ --impute_to_mean \ --split_by_chromosome \ --only_in_key \ -rsid_column rsid \ -output_prefix $DATA/parquet_run7/Chr${chr}_maf0.01_monoallelic_variants
There's an error report: ValueError: Table schema does not match schema used to create file: table: chromosome: int64 position: int64 id: null allele_0: null allele_1: null allele_1_frequency: double rsid: null vs. file: chromosome: int64 position: int64 id: string allele_0: string allele_1: string allele_1_frequency: double rsid: string
But I still got the parquet file for variants, but not for the metadata. Then I generated metadata parquet by myself.
Then I run the gwas imputation using codes below:
python $GWAS_TOOLS/gwas_summary_imputation.py \ -gwas_file $OUTPUT/harmonizedgwas/HM2${trait}.* \ -by_region_file ${LD}/LD_blocks.txt.gz \ -parquet_genotype ${Reference}/Chr${chr}_ARS_UCD1.2_maf0.01_monoallelic_variants.chr${chr}.variants.parquet \ -parquet_genotype_metadata ${Reference}/variant_metadata.parquet \ -window 100000 \ -parsimony 7 \ -chromosome ${chr} \ -regularization 0.1 \ -frequency_filter 0.01 \ -sub_batches 10 \ -sub_batch 0 \ --standardise_dosages \ -output $OUTPUT/summary_imputation/${trait}.chr${chr}.txt.gz
I got the error report: File "/bin/summary-gwas-imputation/src/gwas_summary_imputation.py", line 97, in
run(args)
File "/bin/summary-gwas-imputation/src/gwas_summary_imputation.py", line 62, in run
results = run_by_variant(args)
File "/bin/summary-gwas-imputation/src/gwas_summary_imputation.py", line 24, in run_by_variant
context = SummaryImputationUtilities.context_from_args(args)
File "/bin/summary-gwas-imputation/src/genomic_tools_lib/summary_imputation/Utilities.py", line 174, in context_from_args
study = load_study(args)
File "/bin/summary-gwas-imputation/src/genomic_tools_lib/summary_imputation/Utilities.py", line 162, in load_study
study = Parquet.study_from_parquet(args.parquet_genotype, args.parquet_genotype_metadata, chromosome=args.chromosome)
File "/bin/summary-gwas-imputation/src/genomic_tools_lib/file_formats/Parquet.py", line 218, in study_from_parquet
_v = pq.ParquetFile(variants)
File "/home/anaconda3/envs/imlabtools/lib/python3.7/site-packages/pyarrow/parquet.py", line 137, in init
read_dictionary=read_dictionary, metadata=metadata)
File "pyarrow/_parquet.pyx", line 1048, in pyarrow._parquet.ParquetReader.open
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
OSError: Invalid flatbuffers message.
Could you help to let me know where's wrong with my script?
Thank you very much for your help.
Best Regards,
Shuli