Problem with NaNs - Githubissues

rmalik203 commented 6 years ago

Hi Tony,

I've been trying to use GNOVA in my project. For some of the sumstats files I'm using, I get the following error:

Traceback (most recent call last):
  File "gnova.py", line 86, in <module>
    pipeline(parser.parse_args())
  File "gnova.py", line 47, in pipeline
    out = calculate(gwas_snps, ld_scores, annots, N1, N2)
  File "/Volumes/BD/GNOVA/calculate.py", line 72, in calculate
    m1 = linear_model.LinearRegression().fit(ld_scores, pd.DataFrame((Z_x) ** 2), sample_weight=w1)
  File "/Volumes/Users/Library/Python/2.7/lib/python/site-packages/sklearn/linear_model/base.py", line 458, in fit
    y_numeric=True, multi_output=True)
  File "/Volumes/Users/Library/Python/2.7/lib/python/site-packages/sklearn/utils/validation.py", line 750, in check_X_y
    dtype=None)
  File "/Volumes/Users/Library/Python/2.7/lib/python/site-packages/sklearn/utils/validation.py", line 568, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/Volumes/Users/Library/Python/2.7/lib/python/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
    raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I tracked it down to the prep.py, where the

df = pd.merge(bim, dfs[1], on=['SNP']).merge(dfs[0], on=['SNP'])

produces "NaN". I think this shouldn't be the case. Could this be a matter of the pandas version used? My workaround was to introduce a

df.dropna(inplace=True)

but I don't think that's how it's meant to be.

Rainer

daghli commented 5 years ago

Getting same error.

ShiqiangCheng commented 4 years ago

Getting same error

giuseppe-fanelli commented 4 years ago

idem

giuseppe-fanelli commented 4 years ago

have you found a way to figure out this issue?

daghli commented 4 years ago

Manually remove the NAs from the munged sumstats file!

daghli commented 4 years ago

I would recommend loading into R and trying something like:

library(data.table) df<-fread("your_data",header=TRUE) df<-na.omit(df) write.table(df,"your_data_na_filtered",quote=FALSE,sep="\t",col.names=TRUE,row.names=FALSE)

From: polpett notifications@github.com Sent: Monday, May 4, 2020 5:09 AM To: xtonyjiang/GNOVA GNOVA@noreply.github.com Cc: Daghlas, Iyas iyas_daghlas@hms.harvard.edu; Comment comment@noreply.github.com Subject: Re: [xtonyjiang/GNOVA] Problem with NaNs (#4)

Manually remove the NAs from the munged sumstats file!

Dear daghli, dear all, I have applied a filter based on the expected ranges for each column in bash. But, how can I be sure that I have removed all the ANs from the munged sumstats file? Could you suggest a script to use for this purpose?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_xtonyjiang_GNOVA_issues_4-23issuecomment-2D623348587&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=K-3aDO_vw8tvJ3Rd2Amw41MLoD7GFJRbVkLtN7A4K3w&m=IFlyeTHwEIkzus8ALuEK3DeuyHq2_4Z-fzVO3bRjGv4&s=dDfKss8x8k-Lm_LzP7pT1X2ZlpZoajYs9315cCpTV4c&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMS2D5ORE6VEHOW2P5FGDD3RP2A37ANCNFSM4F7IUHAA&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=K-3aDO_vw8tvJ3Rd2Amw41MLoD7GFJRbVkLtN7A4K3w&m=IFlyeTHwEIkzus8ALuEK3DeuyHq2_4Z-fzVO3bRjGv4&s=Mz5d3QjAS-XZhYdVYnuswMIL1iWWH6FozFnnP7MeQok&e=.

QuanHongLiu commented 1 year ago

I really recommend using this script, because there is no need to use R, pass parameters to it, or anything troublesome. zcat /root/ldsc/sum_use/WBC_snplist.txt.sumstats.gz | awk 'NF==5' - | gzip > /root/GNOVA/sum_use_filter_NA/WBC_snplist.txt.sumstats.gz

xtonyjiang / GNOVA

Problem with NaNs #4