hakyimlab / summary-gwas-imputation

harmonization, liftover, and imputation of summary statistics from GWAS
MIT License
32 stars 20 forks source link

AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'") when using gwas_summary_imputation.py #13

Closed DafniG closed 3 years ago

DafniG commented 3 years ago

Hi,

Thank you for the great documentation and scripts! I am struggling to run the gwas_summary_imputation.py script after having harmonised all of my GWAS. I used the standard files you provide to compile the reference parquet files.

Example of a log file:

INFO - Beginning process
INFO - Creating context by variant
INFO - Loading study
INFO - Loading variants' parquet file
INFO - Loading variants metadata
Level 9 - Loading row group 21
INFO - Loading regions
Level 9 - Selecting target regions with specific chromosome
Level 9 - Selecting target regions from sub-batches
Level 9 - generating GWAS whitelist
INFO - Loading gwas
INFO - Acquiring filter tree for 17127 targets
INFO - Processing gwas source
Level 9 - Loaded 124 GWAS variants
Level 9 - Parsing GWAS
Level 9 - Processing region 1/3 [19924835.0, 22002927.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (22,19924835.0,22002927.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 2/3 [22002927.0, 23370460.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (22,22002927.0,23370460.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 3/3 [23370460.0, 24588236.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (22,23370460.0,24588236.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
INFO - Finished in 3.1158770509064198 seconds

Command used to run it:

python3 $REPO/gwas_summary_imputation.py \
-by_region_file $HOME/eur_ld.hg38.bed \
-gwas_file $DATA/${mydata}.txt.gz \
-parquet_genotype $HOME/genotype/gtex_v8_eur_filtered_maf0.01_monoallelic_variants.chr${chromosome}.variants.parquet \
-parquet_genotype_metadata $HOME/genotype/gtex_v8_eur_filtered_maf0.01_monoallelic_variants.variants_metadata.parquet \
-window 100000 \
-parsimony 7 \
-chromosome ${chromosome} \
-regularization 0.1 \
-frequency_filter 0.01 \
-sub_batches 10 \
-sub_batch ${mybatch} \
--standardise_dosages \
-output results_summary_imputation/${mydata}_chr${chromosome}_sb${mybatch}_reg0.1_ff0.01_by_region.txt.gz

I tried to narrow down where the issue might be and I think where the error is encoutered is at line 252 of SummaryInputation.py: variants = _get_variants(context, ids) I was a bit confused how get_variants gets defined and could not troubleshoot further. Any help would be appreciated!

I am running the pipeline using python 3.8.3, pyarrow 3.0.0 and numpy 1.20.1.

natashasanthanam commented 3 years ago

Hi! The _get_variants function in SummaryInputation.py is only to standardize all your variants from the results file. The Attribute error could be from the version of pyarrow you're using. Could you try running the code again with pyarrow=0.11.1 and numpy=1.18.1 ?

DafniG commented 3 years ago

Hi Natasha. Thank you for your fast reply! That seemed to fix the problem! I had to use python 3.6 as on 3.8 i could not install pyarrow=0.11.1, at least not ussing pip.

rubyaryat commented 2 years ago

I ran into the same issue, following the tutorial using the conda env as specified in master: https://github.com/hakyimlab/summary-gwas-imputation/blob/master/src/conda_env.yaml This has pyarrow=0.11.0. Upgrading to 0.11.1 did not fix. numpy=1.18.1

From debugging the script the error manifests on the line: https://github.com/hakyimlab/summary-gwas-imputation/blob/master/src/genomic_tools_lib/file_formats/Parquet.py#L211

which causes exception: AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'name'

A fix for the issue was committed in a separate branch: https://github.com/hakyimlab/summary-gwas-imputation/blob/ImageXcan_changes/src/genomic_tools_lib/file_formats/Parquet.py#L213-L216

The fix calls the '_name' attribute, instead of 'name' when building the dict.

I applied this change to master, and the gwas_summary_imputation.py script now works.

Is the 'ImageXcan_changes' branch more up to date than master? There look to be several other fixes mentioned in the branch.