Closed grosscol closed 1 year ago
Digging into the underlying sequences.get_cram
call, it turns out the entire file is read into memory.
So the advantage of streaming the file, avoiding loading large files into memory, is already lost.
Just use a simple response for now and create a feature request to more efficiently serve our crams if/when it becomes an issue.
Closed by ef7a9ec87fde3906194fb90097096bf1bfeaf92c
Issue or current state
The sequence endpoint will return either an entire cram file, a subset of a cram file, or an entire crai file. Which it returns depends on the presence of an index parameter.
https://github.com/statgen/bravo_api/blob/9e43f4b52eaebdb88eda643c0f5c272294623b9b/bravo_api/api.py#L471-L481
This caused more cognitive overhead when reading the code and reasoning about how to get a cram and corresponding crai. This make for a more complex return as the cram file should probably be streamed if
X-SENDFILE
is not available while the crai file is expected to be small enough to usesend_file
without regard to the fronting HTTP server capabilities.Resolved when
Two endpoints exist
/sequence
and/sequence_index
. Both take the same params, so the arg map can be reused.Notes
Use
class werkzeug.datastructures.Range
returned fromrequest.range
to parse range tuples instead of regex approach.