genignored commented 1 year ago

Issue: When calling h5dump() on a file generated in house from a Oxford Nanopore MinION MK1C (although we recalled manually using MinKnow, and got the same result), we get this error:

Error: 'idx' argument is outside the range of filters set on this property list.
> traceback()
14: stop("'idx' argument is outside the range of filters set on this property list.",
        call. = FALSE)
13: H5Pget_filter(pid, i - 1)
12: h5checkFilters(h5dataset)
11: value[[3L]](cond)
10: tryCatchOne(expr, names, parentenv, handlers[[1L]])
9: tryCatchList(expr, classes, parentenv, handlers)
8: tryCatch({
       obj <- H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile,
           h5spaceMem = h5spaceMem, compoundAsDataFrame = compoundAsDataFrame,
           drop = drop, ...)
   }, error = function(e) {
       err <- h5checkFilters(h5dataset)
       if (nchar(err) > 0)
           stop(err, call. = FALSE)
       else stop(e)
7: h5readDataset(h5dataset, index = index, start = start, stride = stride,
       block = block, count = count, compoundAsDataFrame = compoundAsDataFrame,
       drop = drop, ...)
6: h5read(h5loc, L[[i]]$name, ..., native = native)
5: h5loadData(group, L[[i]], all = all, ..., native = native)
4: h5loadData(group, L[[i]], all = all, ..., native = native)
3: h5loadData(group, L[[i]], all = all, ..., native = native)
2: h5loadData(loc$H5Identifier, L, all = all, ..., native = native)
1: h5dump("one_seq.fast5")


> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: AlmaLinux 9.1 (Lime Lynx)

Matrix products: default
BLAS/LAPACK: /home/mscholz/miniconda3/envs/rtools/lib/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rhdf5_2.43.3

loaded via a namespace (and not attached):
 [1] magrittr_2.0.3      usethis_2.1.6       devtools_2.4.3
 [4] pkgload_1.2.4       R6_2.5.1            rlang_1.0.2
 [7] fastmap_1.1.0       tools_4.2.0         pkgbuild_1.3.1
[10] sessioninfo_1.2.2   cli_3.3.0           withr_2.5.0
[13] ellipsis_0.3.2      remotes_2.4.2       rprojroot_2.0.3
[16] lifecycle_1.0.1     crayon_1.5.1        brio_1.1.3
[19] processx_3.5.3      purrr_0.3.4         Rhdf5lib_1.20.0
[22] callr_3.7.0         rhdf5filters_1.10.1 fs_1.5.2
[25] ps_1.7.0            testthat_3.1.4      memoise_2.0.1
[28] glue_1.6.2          cachem_1.0.6        compiler_4.2.0
[31] desc_1.4.1          prettyunits_1.1.1

I'm attaching a zipped version of a simplified fast5 file that is giving this error. I will say that it happens regardless of the number of sequences in a fast5 from this source.

grimbough commented 1 year ago

Thanks for the report. There's actually two things going on here.

The first is that you've found a bug in rhdf5, where I'm converting from R base-1 indices to C base-0 indices, but doing it twice. The code is then looking for location -1 and that's why your seeing the "index out of range" message. I've fixed this in the latest versions of rhdf5.

You now get:

> rhdf5::h5dump("~/Downloads/one_seq.fast5")
Error: Unable to read dataset.
Not all required filters available.
Missing filters: vbz

What's happening here is that your HDF5 file is triggering a bit of the package that doesn't get used very often, which is when it encounters a dataset compressed with an unusual filter that is not available to rhdf5. In this case, it seems that ONT have started using their own compression tool called vbz on datasets ( I'll take a look at whether I can add that to to make it easily available.

grimbough commented 1 year ago

This is now working with the latest versions of rhdf5 and rhdf5filters. The updates will make their way into Bioconductor when the next release happens in a few weeks, or you can install directly from Github to get them now.


genignored commented 1 year ago


I was browsing bioconductor and it is still reporting v2.44.0.

Digging deeper, it appears that there are warnings during the automated tests. I'm not sure if that would keep bioconductor from incorporating it, or if this is just standard delay for bioconductor.

Just checking, I suppose.

Thank you!

grimbough commented 8 months ago

Is this still an issue? There were some reports from CRAN about the vbz filter breaking on some of their systems, so I had to role back the changes for a while, but I think it should all be working with the latest versions of Bioconductor.