hammerlab / cycledash

Variant Caller Analysis Dashboard and Data Management System
Other
35 stars 2 forks source link

Embed the GA4GH API in Cycledash #833

Closed danvk closed 9 years ago

danvk commented 9 years ago

Fixes #744

This add a /ga4gh/reads/search API endpoint to Cycledash which could be used to greatly speed up BAM loading.

Things to like about this:

Things to not like:

Review on Reviewable

ihodes commented 9 years ago

Seems reasonable to me! Only issue is, though I'm not sure I'm following this correctly, it appears to only serve local files, while our other method of serving BAMs seems to only serve HDFS files?

danvk commented 9 years ago

That's correct. It's not a big deal since we're serving everything via the /hdfs NFS mount.

@ihodes and I discussed this & decided that it would make more sense to abstract this away in the API by including the BAM ID in the request, rather than the path to the BAM file. I've made this change.

FWIW, we already pull in pysam as a dep, so adding that is no big deal.


Review status: 0 of 7 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


config.py, line 41 [r1] (raw file): Done.


cycledash/api/ga4gh_wrapper.py, line 1 [r1] (raw file): I'll rename it anything but that :) The issue is that it conflicts with ga4gh/__init__.py when I do import ga4gh... later in this file. I'm certainly open to alternative suggestions.


cycledash/api/ga4gh_wrapper.py, line 33 [r1] (raw file): For now, yes. See top-level comment about why this is OK.


cycledash/views.py, line 71 [r1] (raw file): Switched to lower_case_with_under everywhere for local variables. ga4gh uses camelCase.


Comments from the review on Reviewable.io

ihodes commented 9 years ago

Lookin' good, thanks for the changes. Couple minor comments.


Review status: 0 of 7 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


cycledash/api/ga4gh_wrapper.py, line 1 [r1] (raw file): Ah right, of course. I'm fine with the name as it is!


cycledash/api/ga4gh_wrapper.py, line 38 [r1] (raw file): The URL to those BAMs point to HDFS (e.g. hdfs://path/...) and we'd need them to be like /hdfs/path… for this to find them on the mounted HDFS, right? Or is something else happening behind the scenes?

(few seconds later…) Ah I see, the GA4GH root handles that. So in this case, currently, this will work only with HDFS BAMs? (Which is fine for now; it's all we've been supporting anyway). Cool, carry on.


Comments from the review on Reviewable.io

ihodes commented 9 years ago

Review status: 0 of 7 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


cycledash/api/ga4gh_wrapper.py, line 38 [r1] (raw file): Ah, but the URLs submitted still have the scheme in the front, maybe? Need to check this—our URI story needs some serious improving across the board.


Comments from the review on Reviewable.io

danvk commented 9 years ago

Review status: 0 of 7 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


cycledash/api/ga4gh_wrapper.py, line 33 [r1] (raw file): All the BAM "URIs" in our production db are paths, i.e. no hdfs:// prefix. So this will work as written.


Comments from the review on Reviewable.io