zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
482 stars 52 forks source link

Refget v2.0.0 #185

Closed juliangehring closed 1 year ago

juliangehring commented 1 year ago

The refget specifications have been updated to v2.0.0: https://samtools.github.io/hts-specs/refget.html (https://github.com/samtools/hts-specs/pull/479).

Here the official list of changes:

Replace refget’s v1 service-info implementation with GA4GH discovery’s definition of service-info Move code examples out into a Python notebook and a Perl script Replace TRUNC512 with ga4gh identifier as the default SHA-512 based hash identifier (support still available for TRUNC512) All checksums can be requested namespaced with their algorithm Optional support for namespaced identifiers to resolve sequence and metadata Lower cased recommended naming authority strings

The changes for the refget client in noodles are small - nevertheless breaking in some places.

One practical example for refget v2 support is the EBI refget repository (to my knowledge the only production-ready public refget server) which has already been running with the v2 specs. Due to the service-info API changes,

cargo run --example refget_service_info https://ebi.ac.uk/ena/cram/

currently fails with

Error: Request(reqwest::Error { kind: Decode, source: Error("missing field service", line: 1, column: 555) })

juliangehring commented 1 year ago

An update to the new specification is covered in #186. Not all API endpoints are directly backwards compatible with v1.0, making this a breaking change.

With the changes, the example from above

cargo run --example refget_service_info https://ebi.ac.uk/ena/cram/

completes successfully.

juliangehring commented 1 year ago

Looks all good to me. Especially the improvements in 3c09474 add a nice touch to it :)

zaeleus commented 1 year ago

Thanks for looking over everything, @juliangehring. noodles-refget 0.1.0 is now published and included as a feature in noodles 0.47.0.