Open rbutleriii opened 4 months ago
This should really be an issue in the GWASTools package, since gdsSubset
is a GWASTools function. But this is not the intended use of gdsSubset. You don't need to write a new GDS file for every sample set; instead, use the sample.include
argument to pcrelate
to select samples for every iteration.
If you really want to do it this way, set allow.fork=TRUE
in your call to gdsSubset
.
If you do pass it sample.include
, it will error without the training.set
being subset, but it runs with the full set of pcs got the gds. Does the function handle the pc subset, or are the rows mismatched without pcs = mypcair$vectors[both, 1:8]
?
plan("multicore", workers = 8)
options(future.globals.maxSize = 4 * 1e9)
# prune to the relevant biomarker overlap
gds <- GenotypeBlockIterator(GenotypeData(GdsGenotypeReader(gds_fn, allow.fork = TRUE)))
sample.sel <- getScanID(gds)
# for each biomarker make grm
future_lapply(bios, function(i) {
# intersection for variable of interest
both <- intersect(a[DS1V == 0 & !is.na(get(i)), PIDN], sample.sel)
# run PC-Relate
mypcrelate <- pcrelate(gds,
pcs = mypcair$vectors[, 1:8],
training.set = intersect(mypcair$unrels, both),
BPPARAM = BiocParallel::SerialParam(),
sample.include = both
)
# write pcrelate RDS and GRM to file
saveRDS(mypcrelate, file = paste("mypcrelate", i, "rds", sep = "."))
}, future.seed = TRUE)
I don't entirely understand your question. If you're getting an error or unexpected results using sample.include
with pcrelate
, please post a reproducible example on the GENESIS issues page.
If you want to
gdsSubset
your object, you cannot have it open, which is incredibly impractical for downstream use if I need information from it, for example sample IDs.This will fail unless I close the object first, and would be a major chokepoint if I had to access elements inside the gds object in the
lapply
, as it would open and close the gds file for each iteration. Not to mention impossible to use with something likefuture_lapply
trying to parallelize the process since it only allows the file to be open once. Is there a reason it has to lock the file for use?