Closed sdgamboa closed 1 year ago
This seems to be related to https://github.com/waldronlab/BugSigDB/issues/174. I don't see Study 727 in the latest export on BugSigDBExports, but I see that it is now included in the study export directly on BugSigDB. That means I'd expect it to be included in the export on BugSigDBExports on Sunday.
But bsdb <- importBugSigDB(version = 'devel', cache = FALSE)
should be taking directly from bugsigdb.org, no? It looks like an error in the bugsigdbr parsing.
And I don't see the relationship to https://github.com/waldronlab/BugSigDB/issues/174 - that is about Elastic Search indexing, this seems not to be related to any problem on bugsigdb.org. And this study was created more than two Sundays ago, so I think the error is propagating to the exports.
But bsdb <- importBugSigDB(version = 'devel', cache = FALSE) should be taking directly from bugsigdb.org, no?
No, it imports from BugSigDBExports, which does the merging of study, experiment, and signature table, filters incomplete records, adds signature ids, etc.
And I don't see the relationship to https://github.com/waldronlab/BugSigDB/issues/174
My impression is that Ike was running out of disk space and that these studies were thus not included in the export on BugSigDB.
You can actually run the dump_release.R
script on BugSigDBExport
manually, and will see that "Study 727" is now included and will thus also be included in the next export on Sunday, and then will also be available to be pulled via bugsigdbr
.
Ah I didn't realize that devel
pulled from bugsigdbexports. Then the reason it's still not appearing is that the Github action has been erroring for the past three attempts: https://github.com/waldronlab/BugSigDBExports/actions
I assigned to you @lgeistlinger because it looks like a file parsing error in in dump_release.R
Here it would be good if @jwokaty would monitor such repeated failures in the BugSigDBExports GHA (she receives an email about failed runs I believe), forwards the information about repeated failures, and takes action where possible. For the current situation, I don't believe there is anything else to do then to wait for Sunday as running the script manually works fine, and the hiccup seem to have been caused by a temporary ill-formatted / incomplete export on bugsigdb.org.
I just manually triggered a re-run of the latest GHA job (see here and it is still failing. I also tried running the script locally (Rscript BugSigDBExports/inst/scripts/dump_release.R $(date +'%F') BugSigDBExports
) and it also errors for me locally:
Error in strsplit(bsdb[["MetaPhlAn taxon names"]], ",") :
non-character argument
Execution halted
Then I tried stepping through dump_release.R
and found the problem - all signatures are marked as Incomplete and thus being removed:
Browse[2]> table(sigs$State)
Incomplete
2848
I'm going to temporarily get rid of the completeness requirement for signatures because we have a Master's student needing to access recent data for her analysis, then open an issue on the bugsigdb repo. It doesn't seem like any change is needed in this repo other than adding some messages / warnings / errors to make something like this easier to diagnose.
After ignoring Incomplete signatures, I still see another error, this time from bugsigdbr::getSignatures()
- this now seems like your domain @lgeistlinger :
else if (!all(tax.level %in% TAX.LEVELS))
stop("tax.level must be a subset of { ", paste(TAX.LEVELS,
collapse = ", "), " }")
Browse[2]> stop("tax.level must be a subset of { ", paste(TAX.LEVELS,
+ collapse = ", "), " }")
Error during wrapup: tax.level must be a subset of { kingdom, phylum, class, order, family, genus, species, strain }
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
Browse[2]> tax.level
[1] "mixed"
Browse[2]> TAX.LEVELS
[1] "kingdom" "phylum" "class" "order" "family" "genus" "species" "strain"
Sorry, scratch that last post, was a debugging error. The actual error in this loop is (still trying to find a fix):
Error in vapply(spl, function(s) s[length(s)], character(1)) :
values must be length 1,
but FUN(X[[1]]) result is length 0
The error seems to occur inside bugsigdbr::.extractTaxLevel()
(values from debug within function):
function (bug, tax.level)
{
if (is.na(bug))
return(bug)
tip <- .getTip(bug)
tl <- substring(tip, 1, 1)
ind1 <- match(tl, MPA.TAX.LEVELS)
ind2 <- match(tax.level, names(MPA.TAX.LEVELS))
if (ind1 > ind2) {
bug <- unlist(strsplit(bug, "\\|"))
bug <- paste(bug[seq_len(ind2)], collapse = "|")
}
return(bug)
}
Browse[9]> bug
[1] "2|1239|91061"
Browse[9]> tax.level
[1] "mixed"
Browse[9]> if (is.na(bug))
+ return(bug)
Browse[9]> tip <- .getTip(bug)
Browse[9]> tip
[1] "91061"
Browse[9]> tl <- substring(tip, 1, 1)
Browse[9]> tl
[1] "9"
Browse[9]> ind1 <- match(tl, MPA.TAX.LEVELS)
Browse[9]> ind2 <- match(tax.level, names(MPA.TAX.LEVELS))
Browse[9]> ind1
[1] NA
Browse[9]> ind2
[1] NA
Browse[9]> if (ind1 > ind2) {
+ bug <- unlist(strsplit(bug, "\\|"))
+ bug <- paste(bug[seq_len(ind2)], collapse = "|")
+ }
Error during wrapup: missing value where TRUE/FALSE needed
Then I tried stepping through dump_release.R and found the problem - all signatures are marked as Incomplete and thus being removed
Thanks for tracing this and reporting this to Ike. It looks like I only ran the script up to where the full dump is written in line 179 and was happy with seeing Study 727 included and didn't notice things breaking a couple lines further down.
After ignoring Incomplete signatures, I still see another error, this time from bugsigdbr::getSignatures() - this now seems like your domain @lgeistlinger
With incomplete records included, there is potential for all kind of funny things to happen downstream. My preference here would be for Ike to restore the State
column in the export with having incomplete records properly marked and then excluded on our side. If the problem persists on complete records, I'd be happy to take a closer look. Otherwise we are cooking up a solution for dirty data, which should actually be checked for and filtered out upstream.
It looks like the issue persists now that Ike has restored the State
column for signatures. I'll be looking into that.
@sdgamboa @lwaldron
> library(bugsigdbr)
> df <- importBugSigDB(version = "devel", cache = FALSE)
> "Study 727" %in% df$Study
[1] TRUE
Yeah!
I can't find study 727 in the bugsigdb download using devel. See code below.
Created on 2023-04-18 with reprex v2.0.2