hakyimlab / summary-gwas-imputation

harmonization, liftover, and imputation of summary statistics from GWAS
MIT License
31 stars 20 forks source link

Report "OSError: Invalid flatbuffers message." when reading parquet files. #10

Closed shuliliu1991 closed 3 years ago

shuliliu1991 commented 3 years ago

Hi! I am having trouble when I do the gwas imputation. Firstly, I generate the parquet by myself. ( I'm currently working on cattle). Using the command below:

python $REPO/model_training_genotype_to_parquet.py \ -input_genotype_file $DATA/parquet_run7/Chr${chr}-Run7-TAU-Beagle-toDistribute.txt.gz \ -snp_annotation_file $DATA/parquet_run7/Chr${chr}_maf0.01_monoallelic_variants.txt.gz METADATA \ -parsimony 9 \ --impute_to_mean \ --split_by_chromosome \ --only_in_key \ -rsid_column rsid \ -output_prefix $DATA/parquet_run7/Chr${chr}_maf0.01_monoallelic_variants

There's an error report: ValueError: Table schema does not match schema used to create file: table: chromosome: int64 position: int64 id: null allele_0: null allele_1: null allele_1_frequency: double rsid: null vs. file: chromosome: int64 position: int64 id: string allele_0: string allele_1: string allele_1_frequency: double rsid: string

But I still got the parquet file for variants, but not for the metadata. Then I generated metadata parquet by myself.

Then I run the gwas imputation using codes below:

python $GWAS_TOOLS/gwas_summary_imputation.py \ -gwas_file $OUTPUT/harmonizedgwas/HM2${trait}.* \ -by_region_file ${LD}/LD_blocks.txt.gz \ -parquet_genotype ${Reference}/Chr${chr}_ARS_UCD1.2_maf0.01_monoallelic_variants.chr${chr}.variants.parquet \ -parquet_genotype_metadata ${Reference}/variant_metadata.parquet \ -window 100000 \ -parsimony 7 \ -chromosome ${chr} \ -regularization 0.1 \ -frequency_filter 0.01 \ -sub_batches 10 \ -sub_batch 0 \ --standardise_dosages \ -output $OUTPUT/summary_imputation/${trait}.chr${chr}.txt.gz

I got the error report: File "/bin/summary-gwas-imputation/src/gwas_summary_imputation.py", line 97, in run(args) File "/bin/summary-gwas-imputation/src/gwas_summary_imputation.py", line 62, in run results = run_by_variant(args) File "/bin/summary-gwas-imputation/src/gwas_summary_imputation.py", line 24, in run_by_variant context = SummaryImputationUtilities.context_from_args(args) File "/bin/summary-gwas-imputation/src/genomic_tools_lib/summary_imputation/Utilities.py", line 174, in context_from_args study = load_study(args) File "/bin/summary-gwas-imputation/src/genomic_tools_lib/summary_imputation/Utilities.py", line 162, in load_study study = Parquet.study_from_parquet(args.parquet_genotype, args.parquet_genotype_metadata, chromosome=args.chromosome) File "/bin/summary-gwas-imputation/src/genomic_tools_lib/file_formats/Parquet.py", line 218, in study_from_parquet _v = pq.ParquetFile(variants) File "/home/anaconda3/envs/imlabtools/lib/python3.7/site-packages/pyarrow/parquet.py", line 137, in init read_dictionary=read_dictionary, metadata=metadata) File "pyarrow/_parquet.pyx", line 1048, in pyarrow._parquet.ParquetReader.open File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status OSError: Invalid flatbuffers message.

Could you help to let me know where's wrong with my script?

Thank you very much for your help.

Best Regards,

Shuli

shuliliu1991 commented 3 years ago

solve it by change pyarrow 0.16.0 to pyarrow 0.9.0

Heroico commented 3 years ago

Hi Shuli, I'm glad that you solved the issue. There are no plans to update the code to work with later versions of pyarrow.

Best,

Alvaro