segrelabgenomics / QTLEnrich

Assessing enrichment of complex disease or trait associations among QTLs
Other
13 stars 4 forks source link

Issue in the parse_gwas function in liftOver_gwas.py #8

Open Patrick-Wen opened 3 months ago

Patrick-Wen commented 3 months ago

The parse_gwas function in liftOver_gwas.py returns the following error when I loaded the "50_irnt.gwas.imputed_v3.both_sexes.tsv" file file downloaded from: http://www.nealelab.is/uk-biobank:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 tmp2 = parse_gwas(gwas_file)

Cell In[8], line 13, in parse_gwas(gwas_file)
     10 gwas = pd.read_csv(gwas_file,sep="\t")
     12 gwas["variant"] = gwas["variant"].str.replace(":","_")
---> 13 gwas["chr"],gwas["pos"],gwas["non_effect_allele"],gwas["effect_allele"] = gwas["variant"].str.split("_",3).str
     14 gwas["chr"] = "chr"+gwas["chr"]
     15 gwas["pos"] = gwas["pos"].astype(int)

File [~/miniconda3/envs/jlab/lib/python3.12/site-packages/pandas/core/strings/accessor.py:137](http://127.0.0.1:10245/~/miniconda3/envs/jlab/lib/python3.12/site-packages/pandas/core/strings/accessor.py#line=136), in forbid_nonstring_types.<locals>._forbid_nonstring_types.<locals>.wrapper(self, *args, **kwargs)
    132     msg = (
    133         f"Cannot use .str.{func_name} with values of "
    134         f"inferred dtype '{self._inferred_dtype}'."
    135     )
    136     raise TypeError(msg)
--> 137 return func(self, *args, **kwargs)

TypeError: StringMethods.split() takes from 1 to 2 positional arguments but 3 were given

I tried to modify the code as follows, which worked:

def parse_gwas2(gwas_file):
    """
    parses GWAS:
     1. replaces colon (:) with underscore (_)
     2. splits variant header to extract chr, pos, non_effect, and effect_allele
     3. changes chr and pos to appropriate datatypes
     4. p-value heaader to gwas_p_value
     5. id header is used to merge back to older gwas
    """
    import pandas as pd

    gwas = pd.read_csv(gwas_file, sep="\t")

    gwas["variant"] = gwas["variant"].str.replace(":", "_")

    # Split the variant column and assign the resulting DataFrame to a variable
    split_variants = gwas["variant"].str.split("_", n=3, expand=True)

    # Now access the columns of split_variants individually
    gwas["chr"] = "chr" + split_variants[0]
    gwas["pos"] = split_variants[1].astype(int)
    gwas["non_effect_allele"] = split_variants[2]
    gwas["effect_allele"] = split_variants[3]

    gwas["id"] = gwas["chr"] + "_" + gwas["pos"].astype(str)
    gwas = gwas.rename(columns={"pval": "gwas_p_value"})

    return gwas

Please let me know if I am wrong.

Patrick

hyq9588 commented 2 months ago

I also use the tool. Could I ask you a few question?