omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

Unable to handle duplicated SNP OR better error message when duplicates found in extract_snpvar.py #59

Closed Zepeng-Mu closed 3 years ago

Zepeng-Mu commented 3 years ago

Hi, When running extract_snpvar.py I had the following error:

[INFO]  Loading sumstats files...
Traceback (most recent call last):
  File "/home/zepengmu/tools/polyfun/extract_snpvar.py", line 45, in <module>
    df_snps = set_snpid_index(df_snps)
  File "/home/zepengmu/tools/polyfun/polyfun_utils.py", line 75, in set_snpid_index
    df_dup_snps = df_dup_snps.loc[~df_dup_snps.index.duplicated(), ['SNP', 'CHR', 'BP', 'A1', 'A2']]
  File "/scratch/midway2/zepengmu/conda_envs/polyfun/lib/python3.6/site-packages/pandas/core/indexing.py", line 873, in __getitem__
    return self._getitem_tuple(key)
  File "/scratch/midway2/zepengmu/conda_envs/polyfun/lib/python3.6/site-packages/pandas/core/indexing.py", line 1055, in _getitem_tuple
    return self._getitem_tuple_same_dim(tup)
  File "/scratch/midway2/zepengmu/conda_envs/polyfun/lib/python3.6/site-packages/pandas/core/indexing.py", line 750, in _getitem_tuple_same_dim
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "/scratch/midway2/zepengmu/conda_envs/polyfun/lib/python3.6/site-packages/pandas/core/indexing.py", line 1099, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/scratch/midway2/zepengmu/conda_envs/polyfun/lib/python3.6/site-packages/pandas/core/indexing.py", line 1037, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
  File "/scratch/midway2/zepengmu/conda_envs/polyfun/lib/python3.6/site-packages/pandas/core/indexing.py", line 1254, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
  File "/scratch/midway2/zepengmu/conda_envs/polyfun/lib/python3.6/site-packages/pandas/core/indexing.py", line 1316, in _validate_read_indexer
    "Passing list-likes to .loc or [] with any missing labels "
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Index(['SNP'], dtype='object'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"

It seems to say that a SNP column is missing from input, but I have successfully run this on other GWAS with only required CHR BP A1 A2. I noticed this line of code ~df_dup_snps.index.duplicated() in the error message and thought this might be due to duplicated coordinates in my GWAS. Indeed this error message disappears after I removed duplicated SNPs before running extract_snpvar.py. I'm not sure if we have to remove duplicates before running this, but if not, it would be great if the error message is clearer.

Thanks so much!

omerwe commented 3 years ago

Hi,

Thanks for the bug report. I modified the code to not require a SNP column in this function. Can you please git pull and retry?

Zepeng-Mu commented 3 years ago

Hi, It has been fixed. Thanks!!