Closed genignored closed 3 years ago
ack. good catch @genignored
yes, the long-term fix here will be to edit read_ibd
to check for an empty tibble after the call to read_delim
internally (see https://github.com/signaturescience/skater/blob/master/R/read.R#L178-L181). if empty, then dont proceed to select %>% arrange_ids
etc.
in the short term we can get around this with the following (based on code at https://github.com/signaturescience/skatesim/blob/mscholz/snakefiles/calculate_kin_skate.R):
library(tidyverse)
library(skater)
library(furrr)
fps <- snakemake@input[['sims']]
plinkmap <- read_map(snakemake@input[['map']])
seg2kin <- function(fp, .map) {
message(fp)
## check for content in ibd file
## is there at least 1 row data in the file ?!
ind <- read_delim(fp, delim = "\t") %>% nrow() > 0
if(ind) {
outpath <- paste0(tools::file_path_sans_ext(fp), snake@params[['suffix']])
seglikeibd <- read_ibd(fp, source = snakemake@params[['source']])
## just id1,id2,kin
ibd2kin(.ibd_data = seglikeibd, .map = .map) %>%
write_csv(., outpath)
} else {
message("IBD segment file appears empty ... skipping.")
}
}
plan(multisession, workers = 24)
system.time({
future_map(fps, seg2kin, .map = plinkmap)
})
would a mod to read to check for nrow>0 work? As for results I think I'd prefer to have all (eg 432) relkin results files, even if there's a header with zero rows. I can modify this, but around here I'm ensuring that if I read 432 fam files, that I also read 432 akt/king results, and I read 432 ibis results.
It's okay if those files are empty (header only), but not okay if they're not there at all.
read_ibd()
to issue a message if the inferred segment IBD.gz file is empty and return a tibble filled with NA
at each column:The hapibd input appears empty. Creating empty IBD tibble.
# A tibble: 1 x 6
id1 id2 chr start end length
<lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 NA NA NA NA NA NA
let me know if that behavior works or if you all would prefer a different behavior (i.e. returning a truly empty tibble with NULL
values)
as of 41ffc7a the above now returns:
The hapibd input appears empty. Creating empty IBD tibble.
# A tibble: 0 x 6
# … with 6 variables: id1 <chr>, id2 <chr>, chr <chr>, start <chr>, end <chr>, length <chr>
when hap-ibd does not detect any ibd segments, it outputs an empty .ibd.gz file.
Currently seg2kin/read_ibd() dies on empty files:
So, 1) The exception needs to be caught (if no lines, or no data on line, skip line?) 2) does .relkin need to have any lines for other analysis to work, if so, do we need to start also including fam file and outputting all relationships?