hammerlab / cycledash

Variant Caller Analysis Dashboard and Data Management System
Other
35 stars 2 forks source link

Implement the GA4GH searchReads API #744

Closed danvk closed 9 years ago

danvk commented 9 years ago

(Summarizing a conversation with @hammer yesterday)

One way to address #730 (make loading pileups faster) is to serve alignments via Cycledash. The GA4GH project has designed a REST API for querying alignments which we should adhere to.

The API endpoint is /reads/search. Requests & responses are defined by two Avro files: readmethods.avdl and reads.avdl. The latter defines a type, ReadAlignment, which is equivalent to a line in a SAM file. Going down this road would require adding support for this Avro schema to pileup.js as well. This is probably a good idea.

Sample session:

$ curl --data '{"readGroupIds":["low-coverage:HG00534.mapped.ILLUMINA.bwa.CHS.low_coverage.20120522"]}' --header 'Content-Type: application/json' http://localhost:8000/v0.5.1/reads/search
{
  "nextPageToken": "129660:0",
  "alignments": [
    {
      "info": {
        "MD": [ "91" ],
        "NM": [ "0" ],
        "AM": [ "0" ],
        "RG": [ "ERR020238" ],
        "MQ": [ "0" ],
        "BQ": [ "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@B@" ],
        "SM": [ "0" ],
        "X0": [ "349" ],
        "XT": [ "R" ]
      },
      "duplicateFragment": true,
      "alignedQuality": [
        32, 33, 34, 36, 37, 37, 38, 37, 40, 40, 42, 38, 38, 38, 41, 38, 40, 41, 43, 39, 42, 42, 39, 27,
        27, 30, 37, 37, 40, 30, 34, 37, 36, 36, 40, 30, 36, 32, 37, 32, 43, 39, 42, 23, 40, 37, 36, 30,
        40, 28, 37, 33, 37, 32, 40, 38, 34, 37, 40, 22, 33, 32, 36, 32, 38, 38, 36, 33, 33, 37, 41, 37,
        34, 18, 31, 31, 38, 23, 34, 32, 32, 16, 32, 33, 32, 35, 18, 34, 37, 33, 2
      ],
      "failedVendorQualityChecks": false,
      "fragmentName": "ERR020238.40049218",
      "readNumber": 1,
      "properPlacement": true,
      "nextMatePosition": {
        "position": 10405,
        "reverseStrand": false,
        "referenceName": "1"
      },
      "supplementaryAlignment": false,
      "numberReads": 2,
      "fragmentLength": 431,
      "secondaryAlignment": false,
      "alignedSequence": "ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA",
      "id": "low-coverage:HG00534.mapped.ILLUMINA.bwa.CHS.low_coverage.20120522:ERR020238.40049218",
      "alignment": {
        "position": {
          "position": 10008,
          "reverseStrand": false,
          "referenceName": "1"
        },
        "cigar": [
          {
            "referenceSequence": null,
            "operation": "ALIGNMENT_MATCH",
            "operationLength": 91
          }
        ],
        "mappingQuality": 0
      },
      "readGroupId": "low-coverage:HG00534.mapped.ILLUMINA.bwa.CHS.low_coverage.20120522"
    },
    {
      "info": {
        "MD": [ "91" ],
        "NM": [ "0" ],
        "AM": [ "0" ],
        "RG": [ "ERR020238" ],
        "MQ": [ "0" ],
        "BQ": [ "BC@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@CH" ],
        "SM": [ "0" ],
        "X0": [ "348" ],
        "XT": [ "R" ]
      },
      "duplicateFragment": true,
      "alignedQuality": [
        31, 32, 36, 35, 36, 32, 39, 39, 41, 39, 41, 38, 41, 41, 37, 39, 41, 34, 39, 39, 39, 30, 34, 34, 36,
        36, 36, 37, 39, 35, 27, 27, 39, 37, 39, 35, 39, 36, 34, 37, 41, 33, 29, 39, 41, 34, 34, 23, 33, 23,
        34, 31, 31, 27, 32, 37, 41, 38, 27, 35, 32, 33, 37, 31, 33, 37, 38, 38, 39, 31, 37, 34, 34, 34, 41,
        37, 33, 28, 38, 33, 40, 33, 31, 35, 31, 34, 40, 36, 34, 32, 34
      ],
      "failedVendorQualityChecks": false,
      "fragmentName": "ERR020238.5533709",
      "readNumber": 1,
      "properPlacement": true,
      "nextMatePosition": {
        "position": 10398,
        "reverseStrand": false,
        "referenceName": "1"
      },
      "supplementaryAlignment": false,
      "numberReads": 2,
      "fragmentLength": 428,
      "secondaryAlignment": false,
      "alignedSequence": "CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC",
      "id": "low-coverage:HG00534.mapped.ILLUMINA.bwa.CHS.low_coverage.20120522:ERR020238.5533709",
      "alignment": {
        "position": {
          "position": 10010,
          "reverseStrand": false,
          "referenceName": "1"
        },
        "cigar": [
          {
            "referenceSequence": null,
            "operation": "ALIGNMENT_MATCH",
            "operationLength": 91
          }
        ],
        "mappingQuality": 0
      },
      "readGroupId": "low-coverage:HG00534.mapped.ILLUMINA.bwa.CHS.low_coverage.20120522"
    }
  ]
}
danvk commented 9 years ago

I got the ga4gh server to run on hammerlab-dev2 and serve alignments out of our BAM files on HDFS.

The steps:

This seems to work well in practice—I can run fetches on the pathological BAMs quickly.