statgen / bravo_api

Server side data processing and retrieval endpoints for BRAVO
MIT License
1 stars 2 forks source link

Split out sequence index (crai) from `/sequence` route. #7

Closed grosscol closed 1 year ago

grosscol commented 3 years ago

Issue or current state

The sequence endpoint will return either an entire cram file, a subset of a cram file, or an entire crai file. Which it returns depends on the presence of an index parameter.

https://github.com/statgen/bravo_api/blob/9e43f4b52eaebdb88eda643c0f5c272294623b9b/bravo_api/api.py#L471-L481

This caused more cognitive overhead when reading the code and reasoning about how to get a cram and corresponding crai. This make for a more complex return as the cram file should probably be streamed if X-SENDFILE is not available while the crai file is expected to be small enough to use send_file without regard to the fronting HTTP server capabilities.

Resolved when

Two endpoints exist /sequence and /sequence_index. Both take the same params, so the arg map can be reused.

Notes

Use class werkzeug.datastructures.Range returned from request.range to parse range tuples instead of regex approach.

grosscol commented 3 years ago

Digging into the underlying sequences.get_cram call, it turns out the entire file is read into memory. So the advantage of streaming the file, avoiding loading large files into memory, is already lost. Just use a simple response for now and create a feature request to more efficiently serve our crams if/when it becomes an issue.

grosscol commented 1 year ago

Closed by ef7a9ec87fde3906194fb90097096bf1bfeaf92c