The parse_gwas function in liftOver_gwas.py returns the following error when I loaded the "50_irnt.gwas.imputed_v3.both_sexes.tsv" file file downloaded from: http://www.nealelab.is/uk-biobank:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 tmp2 = parse_gwas(gwas_file)
Cell In[8], line 13, in parse_gwas(gwas_file)
10 gwas = pd.read_csv(gwas_file,sep="\t")
12 gwas["variant"] = gwas["variant"].str.replace(":","_")
---> 13 gwas["chr"],gwas["pos"],gwas["non_effect_allele"],gwas["effect_allele"] = gwas["variant"].str.split("_",3).str
14 gwas["chr"] = "chr"+gwas["chr"]
15 gwas["pos"] = gwas["pos"].astype(int)
File [~/miniconda3/envs/jlab/lib/python3.12/site-packages/pandas/core/strings/accessor.py:137](http://127.0.0.1:10245/~/miniconda3/envs/jlab/lib/python3.12/site-packages/pandas/core/strings/accessor.py#line=136), in forbid_nonstring_types.<locals>._forbid_nonstring_types.<locals>.wrapper(self, *args, **kwargs)
132 msg = (
133 f"Cannot use .str.{func_name} with values of "
134 f"inferred dtype '{self._inferred_dtype}'."
135 )
136 raise TypeError(msg)
--> 137 return func(self, *args, **kwargs)
TypeError: StringMethods.split() takes from 1 to 2 positional arguments but 3 were given
I tried to modify the code as follows, which worked:
def parse_gwas2(gwas_file):
"""
parses GWAS:
1. replaces colon (:) with underscore (_)
2. splits variant header to extract chr, pos, non_effect, and effect_allele
3. changes chr and pos to appropriate datatypes
4. p-value heaader to gwas_p_value
5. id header is used to merge back to older gwas
"""
import pandas as pd
gwas = pd.read_csv(gwas_file, sep="\t")
gwas["variant"] = gwas["variant"].str.replace(":", "_")
# Split the variant column and assign the resulting DataFrame to a variable
split_variants = gwas["variant"].str.split("_", n=3, expand=True)
# Now access the columns of split_variants individually
gwas["chr"] = "chr" + split_variants[0]
gwas["pos"] = split_variants[1].astype(int)
gwas["non_effect_allele"] = split_variants[2]
gwas["effect_allele"] = split_variants[3]
gwas["id"] = gwas["chr"] + "_" + gwas["pos"].astype(str)
gwas = gwas.rename(columns={"pval": "gwas_p_value"})
return gwas
The parse_gwas function in liftOver_gwas.py returns the following error when I loaded the "50_irnt.gwas.imputed_v3.both_sexes.tsv" file file downloaded from: http://www.nealelab.is/uk-biobank:
I tried to modify the code as follows, which worked:
Please let me know if I am wrong.
Patrick