sritchie73 / ukbnmr

Tools for processing Nightingale NMR biomarker data in UK Biobank
Other
27 stars 1 forks source link

´Inf´ input to sequence error #8

Closed denvdm closed 6 days ago

denvdm commented 11 months ago

Hi, I´m getting the below error after about 5 minutes of running the remove_technical_variation() function. I´ve followed the steps as outlined in the readme to the letter; it doesnt happen with the toy data, but only with the full UKB dataset. This data has however been decoded using the standard steps, nothing custom. Any idea what step/input this involves, and maybe how to resolve this? Thanks. Best, Dennis

Error in seq.int(rx[1L], rx[2L], length.out = nb) : 'from' must be a finite number In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf

sritchie73 commented 11 months ago

Hi @denvdm ,

Sorry to hear you're getting an error.

It's not obvious to me where that error would be arising, so I will need some help tracking down the culprit. I've created a new branch for debugging this issue, in which I've injected progress messages into the remove_technical_variation() function.

Can you re-install the ukbnmr package using remotes::install_github("sritchie73/ukbnmr", ref = "issue8"), then rerun the remove_technical_variation() function on your data, and let me know the last progress message printed before you get the error?

This will help me track down where in the process the error is arising.

denvdm commented 11 months ago

Hi, thank you so much, very helpful. The progress messages are handy and nice, perhaps even consider making this part of the main branch? In any case, we have the classic embarrassing situation where the error disappears right after I cry for help. Terribly sorry about that. When I first ran this branch, I got up to 'Processing sample processing fields for QC procedure...' when the error appeared. However, I then decided to re-create the input dataframe (with same code, just out of desperation), and now it does run all the way! I assume the earlier dataframe had some sort of issue that wrote an incomplete file, causing missingness that in turn caused the 'Inf' error. Still not sure how or where though, as I did check whether all columns contained at least some non-missing data. So, all good now and again apologies for wasting your time. Best, Dennis

sritchie73 commented 11 months ago

Hi Dennis,

Great to hear it ended up working for you! It's always challenging to diagnose bugs that come from changes to input data :)

Best wishes,

anbai106 commented 2 weeks ago

I got the same error with my own data (Data is recently downloaded from the UKB-RAP, and I have adjusted the dataframe column names to successfully run the first step):

Checking for relevant UKB fields... Extracting and pre-processing data... Checking for required sample processing fields needed for QC procedure... Processing sample processing fields for QC procedure... Error in seq.int(rx[1L], rx[2L], length.out = nb) : 'from' must be a finite number In addition, Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf

I tried the solution to install that specific version, but the same error occurred. I initially installed the most cutting-edge version.

I tried to debug this and found out the error came from this line (Line 86) of code: bin_dates(Plate.Measured.Date, version), where:

sinfo size is 198 x 13, and the column for Plate.Measured.Date:

[1] NA NA "2021-08-23" NA NA NA "2020-03-27" "2021-10-29" NA NA
[11] "2022-02-18" NA NA NA "2019-11-13" NA NA NA "2022-03-28" NA
[21] NA NA NA NA "2022-05-04" NA "2020-12-01" "2021-08-26" NA NA
[31] NA NA "2020-01-04" NA "2022-03-06" "2021-10-28" NA NA "2019-10-03" NA
[41] "2022-03-27" NA "2020-02-05" NA "2020-03-14" NA "2019-07-04" "2022-03-11" NA NA
[51] "2020-03-12" NA "2021-07-04" NA NA NA NA NA "2022-03-28" NA
[61] "2019-08-12" "2019-12-03" "2022-04-19" "2022-02-11" "2022-02-25" "2019-10-02" "2022-03-20" NA NA NA
[71] "2021-10-07" NA "2020-03-25" "2019-06-17" NA NA "2021-06-15" NA "2022-05-02" "2022-02-18" [81] "2021-11-04" NA "2021-09-07" "2021-10-22" "2022-02-19" "2022-04-02" "2021-06-01" NA "2021-07-26" "2019-06-17" [91] "2022-04-14" NA "2022-03-10" NA NA NA "2021-04-21" NA NA NA
[101] NA NA "2020-01-05" NA "2021-05-28" NA "2022-03-15" "2022-06-09" "2022-04-05" NA
[111] "2020-05-23" "2019-11-24" "2019-11-23" NA NA NA NA NA "2019-09-13" NA
[121] "2019-08-14" "2020-04-04" "2022-03-28" NA NA NA "2022-03-29" NA "2020-01-10" NA
[131] "2021-10-07" NA "2019-08-20" NA NA NA "2019-12-25" NA NA NA
[141] "2021-04-08" NA "2022-02-20" "2022-03-08" NA NA "2020-02-29" NA "2020-12-01" NA
[151] NA NA "2019-10-07" NA NA NA NA NA "2021-06-06" "2020-01-06" [161] "2020-05-09" "2019-12-16" "2021-05-12" NA NA NA NA NA NA NA
[171] "2020-11-05" NA "2022-04-30" NA "2021-08-26" "2021-07-13" "2022-05-23" NA NA NA
[181] "2020-04-16" "2019-07-19" "2022-05-09" NA "2022-04-08" NA NA NA "2022-04-21" NA
[191] NA NA NA NA "2021-04-08" NA NA NA

Then I further debug inside the bin_dates function, and found the error from this function bins <- cut(unique(date_order), n_bins, labels = FALSE), where:

Any idea? Do you have any other solutions? Is this because the UKB-RAP downloading may change the data format that cannot directly use the package?

anbai106 commented 2 weeks ago

So I found the reason why that line (86) gave the error: it is because the sinfo dataframe's column (Plate.Measured.Date) has NA values. If I removed the rows (which does not make sense because I will loose too many rows), it passed that line of code, but gave new errors at Line 118:

Error in qr.default(x) : NA/NaN/Inf in foreign function call (arg 1)