grimbough / rhdf5

Package providing an interface between HDF5 and R
http://bioconductor.org/packages/rhdf5
59 stars 22 forks source link

h5dump() fails on ONT h5 called with Minknow #126

Closed genignored closed 5 months ago

genignored commented 1 year ago

Issue: When calling h5dump() on a file generated in house from a Oxford Nanopore MinION MK1C (although we recalled manually using MinKnow, and got the same result), we get this error:

 hdf5dump("one_seq.fast5")
Error: 'idx' argument is outside the range of filters set on this property list.
> traceback()
14: stop("'idx' argument is outside the range of filters set on this property list.",
        call. = FALSE)
13: H5Pget_filter(pid, i - 1)
12: h5checkFilters(h5dataset)
11: value[[3L]](cond)
10: tryCatchOne(expr, names, parentenv, handlers[[1L]])
9: tryCatchList(expr, classes, parentenv, handlers)
8: tryCatch({
       obj <- H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile,
           h5spaceMem = h5spaceMem, compoundAsDataFrame = compoundAsDataFrame,
           drop = drop, ...)
   }, error = function(e) {
       err <- h5checkFilters(h5dataset)
       on.exit(H5Dclose(h5dataset))
       if (nchar(err) > 0)
           stop(err, call. = FALSE)
       else stop(e)
   })
7: h5readDataset(h5dataset, index = index, start = start, stride = stride,
       block = block, count = count, compoundAsDataFrame = compoundAsDataFrame,
       drop = drop, ...)
6: h5read(h5loc, L[[i]]$name, ..., native = native)
5: h5loadData(group, L[[i]], all = all, ..., native = native)
4: h5loadData(group, L[[i]], all = all, ..., native = native)
3: h5loadData(group, L[[i]], all = all, ..., native = native)
2: h5loadData(loc$H5Identifier, L, all = all, ..., native = native)
1: h5dump("one_seq.fast5")

sessionInfo():

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: AlmaLinux 9.1 (Lime Lynx)

Matrix products: default
BLAS/LAPACK: /home/mscholz/miniconda3/envs/rtools/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rhdf5_2.43.3

loaded via a namespace (and not attached):
 [1] magrittr_2.0.3      usethis_2.1.6       devtools_2.4.3
 [4] pkgload_1.2.4       R6_2.5.1            rlang_1.0.2
 [7] fastmap_1.1.0       tools_4.2.0         pkgbuild_1.3.1
[10] sessioninfo_1.2.2   cli_3.3.0           withr_2.5.0
[13] ellipsis_0.3.2      remotes_2.4.2       rprojroot_2.0.3
[16] lifecycle_1.0.1     crayon_1.5.1        brio_1.1.3
[19] processx_3.5.3      purrr_0.3.4         Rhdf5lib_1.20.0
[22] callr_3.7.0         rhdf5filters_1.10.1 fs_1.5.2
[25] ps_1.7.0            testthat_3.1.4      memoise_2.0.1
[28] glue_1.6.2          cachem_1.0.6        compiler_4.2.0
[31] desc_1.4.1          prettyunits_1.1.1

I'm attaching a zipped version of a simplified fast5 file that is giving this error. I will say that it happens regardless of the number of sequences in a fast5 from this source.

one_seq.zip

grimbough commented 1 year ago

Thanks for the report. There's actually two things going on here.

The first is that you've found a bug in rhdf5, where I'm converting from R base-1 indices to C base-0 indices, but doing it twice. The code is then looking for location -1 and that's why your seeing the "index out of range" message. I've fixed this in the latest versions of rhdf5.

You now get:

> rhdf5::h5dump("~/Downloads/one_seq.fast5")
Error: Unable to read dataset.
Not all required filters available.
Missing filters: vbz

What's happening here is that your HDF5 file is triggering a bit of the package that doesn't get used very often, which is when it encounters a dataset compressed with an unusual filter that is not available to rhdf5. In this case, it seems that ONT have started using their own compression tool called vbz on datasets (https://github.com/nanoporetech/vbz_compression). I'll take a look at whether I can add that to https://github.com/grimbough/rhdf5filters to make it easily available.

grimbough commented 1 year ago

This is now working with the latest versions of rhdf5 and rhdf5filters. The updates will make their way into Bioconductor when the next release happens in a few weeks, or you can install directly from Github to get them now.

library(rhdf5)

packageVersion("rhdf5")
#> [1] '2.43.5'
packageVersion("rhdf5filters")
#> [1] '1.11.2'

h5dump("~/Downloads/one_seq.fast5")
#> $Raw
#> $Raw$Reads
#> $Raw$Reads$Read_1283
#> $Raw$Reads$Read_1283$Signal
#>    [1] 486 491 487 486 487 484 499 498 481 486 380 399 412 434 432 427 440 427
#>   [19] 416 441 422 413 423 427 419 417 425 424 420 408 438 442 430 440 427 428
#>   [37] 435 427 432 432 417 427 417 417 430 423 412 443 421 411 417 420 423 418
#>   [55] 418 419 417 415 434 435 417 453 419 411 438 426 419 410 426 416 423 451
#>   [73] 420 440 424 433 432 419 428 419 412 426 422 426 421 421 418 414 408 413
#>   [91] 424 424 412 430 428 428 401 455 425 415 411 427 426 425 418 426 412 417
#>  [109] 416 420 414 423 430 409 420 424 416 431 425 423 416 421 423 408 414 421
#>  [127] 409 423 416 413 414 413 419 429 416 421 413 426 415 423 407 416 412 412
#>  [145] 414 413 410 417 415 414 424 404 418 416 402 420 409 424 417 409 423 422
#>  [163] 413 430 412 419 405 411 416 419 422 415 424 423 416 416 416 421 418 422
#>  [181] 414 423 409 415 418 420 410 414 407 419 415 423 427 423 424 417 419 418
#>  [199] 421 434 426 417 432 422 423 439 434 426 416 403 419 431 420 422 422 413
#>  [217] 427 422 415 418 420 424 424 428 429 430 414 423 421 439 411 408 430 428
#>  [235] 416 432 408 415 397 416 408 411 415 415 406 412 417 424 421 415 419 419
#>  [253] 422 411 421 421 416 415 408 416 391 391 399 390 393 393 467 451 440 486
#>  [271] 513 542 499 566 706 716 674 700 679 696 683 695 685 699 692 690 687 690
#>  [289] 684 697 686 713 688 690 694 669 664 681 670 701 689 651 611 606 616 620
#>  [307] 613 616 620 637 614 597 621 652 606 627 590 612 599 514 523 513 510 490
#>  [325] 437 341 333 337 344 362 348 343 338 344 328 333 326 328 395 472 457 458
#>  [343] 448 478 452 447 425 359 369 370 468 472 482 489 486 479 484 462 388 405
#>  [361] 407 386 395 385 396 400 406 383 368 380 375 378 369 374 373 378 368 372
#>  [379] 374 380 374 431 441 455 456 465 451 458 470 498 486 506 506 495 504 492
#>  [397] 505 502 493 482 425 442 447 431 446 454 447 448 445 448 443 442 433 433
#>  [415] 440 451 450 425 443 434 428 448 450 460 424 419 434 432 293 266 257 266
#>  [433] 266 261 243 267 257 277 266 288 401 443 439 447 435 421 428 427 420 294
#>  [451] 260 257 266 281 363 444 423 423 439 474 448 456 519 542 521 511 515 520
#>  [469] 497 462 466 457 464 434 345 351 359 357 348 356 346 345 332 344 337 341
#>  [487] 323 284 256 276 259 270 346 465 454 473 476 469 486 467 465 478 467 449
#>  [505] 478 468 483 446 426 423 426 420 416 424 429 413 418 428 407 432 410 410
#>  [523] 424 425 418 414 417 426 448 457 422 440 441 435 449 452 446 363 344 361
#>  [541] 351 361 350 359 378 494 503 490 495 515 493 477 470 507 483 496 514 503
#>  [559] 454 344 329 380 343 342 356 344 445 477 468 480 469 469 465 501 471 457
#>  [577] 479 460 417 404 381 386 409 464 480 479 477 503 480 481 498 493 492 476
#>  [595] 480 495 492 481 485 478 479 481 482 476 477 483 473 471 492 443 398 388
#>  [613] 397 402 394 386 392 381 365 413 444 451 443 451 457 474 497 539 520 500
#>  [631] 490 437 431 431 443 445 430 412 412 420 443 377 390 393 388 398 396 401
#>  [649] 393 398 387 386 400 390 390 385 365 310 316 311 319 310 311 308 323 330
#>  [667] 318 308 312 327 320 279 242 248 253 253 237 246 266 240 254 270 276 376
#>  [685] 491 485 492 490 479 474 489 492 484 489 495 501 487 484 488 500 485 479
#>  [703] 480 495 479 482 503 488 482 489 477 380 390 371 373 382 384 377 388 381
#>  [721] 378 379 383 395 373 385 364 387 382 379 389 374 374 379 376 389 387 403
#>  [739] 384 381 375 401 378 366 370 387 404 402 408 398 402 405 401 410 409 418
#>  [757] 406 407 404 408 401 401 411 411 449 483 476 455 447 437 441 440 440 438
#>  [775] 493 512 499 504 506 511 517 498 443 449 432 452 451 443 437 452 456 452
#>  [793] 425 450 451 440 441 443 451 443 425 393 402 394 383 395 394 409 391 406
#>  [811] 397 388 379 399 387 394 369 363 355 368 363 361 351 309 295 301 303 319
#>  [829] 301 314 304 318 298 292 296 300 314 320 315 293 300 300 302 324 296 322
#>  [847] 287 334 368 391 375 385 378 380 380 387 395 402 387 403 455 448 459 441
#>  [865] 408 422 433 423 416 425 410 433 503 515 514 512 505 498 510 509 500 508
#>  [883] 495 508 504 512 517 507 502 507 505 527 506 521 512 506 496 510 509 546
#>  [901] 528 533 546 536 463 458 461 462 325 327 333 368 421 425 420 445 519 499
#>  [919] 474 470 438 388 409 384 383 382 383 373 387 376 382 398 390 400 395 421
#>  [937] 413 404 419 423 428 452 442 460 470 473 451 442 449 436 440 448 450 469
#>  [955] 471 458 465 457 474 537 542 553 545 536 522 529 531 522 388 393 383 363
#>  [973] 361 378 383 455 466 483 461 472 473 480 493 480 474 470 479 520 533 525
#>  [991] 540 498 485 473 498 495 476 481 485 478 489 474 385 398 387 390 384 383
#> [1009] 384 373 381 384 391 377 381 385 372 380 369 404 376 369 378 387 382 375
#> [1027] 381 406 393 404 429 453 437 442 452 445 430 421 415 416 417 447 415 464
#> [1045] 474 483 481 496 464 486 477 485 476 486 481 481 441 407 404 403 406 415
#> [1063] 409 415 422 485 584 567 561 565 501 509 502 508 502 503 506 499 493 492
#> [1081] 503 504 498 504 487 496 503 494 496 443 443 425 411 408 413 414 417 376
#> [1099] 346 365 366 356 367 359 370 363 381 376 386 389 390 376 373 382 377 382
#> [1117] 355 358 325 323 314 300 306 309 298 296 278 281 296 284 288 296 313 378
#> [1135] 383 384 397 424 503 497 501 480 435 455 441 428 431 434 428 442 427 439
#> [1153] 399 405 398 400 404 404 392 422 513 495 494 428 378 376 383 394 396 468
#> [1171] 464 465 472 461 496 509 508 499 461 339 349 337 354 346 341 338 315 260
#> [1189] 244 264 241 261 259 257 269 256 453 450 442 447 440 439 461 486 472 495
#> [1207] 491 505 494 480 384 373 387 380 360 355 364 384 399 369 371 356 368 376
#> [1225] 361 379 369 375 355 369 354 374 395 508 523 500 531 523 515 520 515 513
#> [1243] 503 511 487 492 479 446 396 403 409 409 396 391 408 396 366 388 388 356
#> [1261] 308 298 307 316 304 312 312 315 303 315 307 325 308 316 363 361 369 363
#> [1279] 366 366 369 365 365 372 369 384 478 504 476 436 428 431 430 393 317 386
#> [1297] 468 476 459 442 459 485 531 527 528 535 523 519 519 524 520 532 530 482
#> [1315] 446 448 452 454 452 463 465 451 442 453 447 459 439 442 418 411 416 420
#> [1333] 399 410 406 405 402 413 407 402 435 490 474 504 479 481 467 418 361 370
#> [1351] 361 356 364 357 356 357 366 348 350 345 329 338 334 288 253 257 247 262
#> [1369] 241 254 260 254 354 412 413 432 427 443 438 413 435 424 438 409 438 423
#> [1387] 402 428 424 431 427 430 429 426 424 432 442 459 452 450 448 464 458 456
#> [1405] 526 510 507 525 508 530 521 511 524 465 440 455 459 453 433 409 422 410
#> [1423] 418 412 417 417 407 401 409 432 449 470 447 464 448 461 446 477 454 453
#> [1441] 458 462 466 474 476 481 507 497 505 509 510 529 516 499 508 485 523 516
#> [1459] 498 496 504 506 508 508 505 501 489 507 490 508 511 498 493 486 495 484
#> [1477] 489 513 486 478 476 489 480 422 341 367 343 347 335 297 285 300 323 320
#> [1495] 560 575 583 580 578 548 567 527 475 472 473 466 451 426 428 435 413 438
#> [1513] 420 416 426 446 515 515 512 490 510 500 438 431 422 425 423 415 427 437
#> [1531] 420 421 420 450 417 422 425 424 417 418 412 415 427 403 405 480 497 500
#> [1549] 495 481 471 492 456 448 435 371 373 352 354 367 343 359 349 356 365 341
#> [1567] 356 367 348 351 351 356 362 350 359 327 327 325 333 364 431 394 383 392
#> [1585] 387 388 392 388 390 398 379 394 387 392 371 382 386 370 377 377 381 390
#> [1603] 391 400 388 394 411 395 412 413 434 453 440 446 442 408 419 408 395 397
#> [1621] 392 383 409 395 480 520 503 434 409 415 415 415 392 322 301 319 299 279
#> [1639] 263 266 274 281 270 285 262 270 293 278 483 486 479 491 502 497 486 498
#> [1657] 483 503 512 482 475 397 397 397 435 442 437 439 425 434 435 432 437 448
#> [1675] 452 434 447 438 445 522 483 499 507 492 460 397 401 375 370 396 357 373
#> [1693] 373 360 328 339 334 343 344 331 332 336 339 346 323 377 379 380 390 375
#> [1711] 392 393 378 378 364 347 340 363 343 367 357 368 355 362 341 350 365 369
#> [1729] 360 366 352 344 341 350 346 362 340 352 359 363 354 348 352 354 357 367
#> [1747] 363 341 364 355 354 362 362 347 355 361 358 357 349 365 356 367 362 371
#> [1765] 361 383 366 385 357 336 344 345 335 342 344 337 337 346 333 340 334 338
#> [1783] 328 248 254 269 260 260 262 264 252 255 254 273 266 252 294 472 485 477
#> [1801] 473 486 478 485 481 480 485 477 471 493 493 486 458 474 479 479 380 358
#> [1819] 420 404 400 370 392 411 393 381 396 397 418 416 396 472 615 603 584 579
#> [1837] 597 512 451 482 464 487 472 464 462 457 473 481 478 463 464 474 449 457
#> [1855] 445 452 450 450 461 360 347 362 359 362 358 347 360 351 358 352 364 368
#> [1873] 471 551 551 543 528 546 539 540 559 538 562 561 550 534 535 536 540 539
#> [1891] 536 543 552 572 504 500 486 519 519 509 509 422 504 505 466 568 596 577
#> [1909] 599 560 555 571 569 502 396 386 399 390 389 405 407 406 397 400 402 394
#> [1927] 401 405 405 398 415 451 452 462 468 485 443 512 522 523 528 519 515 531
#> [1945] 499 491 512 513 493 491 485 500 477 487 474 371 375 373 380 381 380 385
#> [1963] 378 379 384 383 376 397 430 425 418 414 416 420 413 434 429 427 446 446
#> [1981] 489 503 501 527 500 495 481 421 415 423 514 544 531 530 514 538 540 545
#> [1999] 523 535 516 533 516 511 486 504 509 421 390 399 388 400 428 433 435 443
#> [2017] 440 446 437 432 463 540 527 526 541 539 530 538 528 518 524 397 377 391
#> [2035] 394 376 379 380 408 398 396 406 424 470 497 520 529 557 581 558 571 590
#> [2053] 559 496 449 457 454 456 464 466 461 473 474 467 472 474 486 482 483 499
#> [2071] 489 493 469 478 505 588 550 473 456 523 571 557 560 564 571 550 555 548
#> [2089] 565 562 456 377 389 421 423 420 427 425 428 430 423 419 425 415 417 418
#> [2107] 421 421 464 498 483 494 486 495
#> 
#> 
#> 
#> 
#> $UniqueGlobalKey
#> $UniqueGlobalKey$channel_id
#> NULL
#> 
#> $UniqueGlobalKey$context_tags
#> NULL
#> 
#> $UniqueGlobalKey$tracking_id
#> NULL
genignored commented 1 year ago

Hi,

I was browsing bioconductor and it is still reporting v2.44.0.

Digging deeper, it appears that there are warnings during the automated tests. I'm not sure if that would keep bioconductor from incorporating it, or if this is just standard delay for bioconductor.

Just checking, I suppose.

Thank you!

grimbough commented 8 months ago

Is this still an issue? There were some reports from CRAN about the vbz filter breaking on some of their systems, so I had to role back the changes for a while, but I think it should all be working with the latest versions of Bioconductor.