umccr / htsget-rs

A server implementation of the htsget protocol for bioinformatics in Rust
https://samtools.github.io/hts-specs/htsget.html
MIT License
39 stars 9 forks source link

Error returned when reference name is in header but has no mapped index positions #201

Closed mmalenic closed 1 year ago

mmalenic commented 1 year ago

An error is returned when a reference name is present in the header, but there are no chunks associated with it in the index. For example, this file only contains mapped reads for reference names 11 and 20, even though there are many more SN reference names in the header. When a reference name without mapped reads is requested, an error is returned:

{
  "htsget": {
    "error": "NotFound",
    "message": "could not find byte ranges for reference sequence"
  }
}

This behaviour is probably not correct, as a NotFound error should be returned "if the requested reference does not exist." While this could mean that the reference name does not exist in the index, it is probably more correct to return an empty response with only header and EOF byte ranges, e.g:

{
   "htsget" : {
      "format" : "BAM",
      "urls" : [
         {
            "url" : "...",
            "headers": {
                "Range": "bytes=0-..."
            }
            "class" : "header"
         },
         {
            "url": "data:;base64,H4sIBAAAAAAA/wYAQkMCABsAAwAAAAAAAAAAAA=="
         }
      ]
   }
}

Further supporting this is that "successful requests with empty result sets still produce a valid response in the requested format (e.g. including header and EOF marker)." This would also match other tools that do not report error conditions when reading files that have reference names in the header but not in the file itself, such as samtools and noodles.