varfish-org / varfish-cli

VarFish REST API client (CLI + Python package)
MIT License
2 stars 3 forks source link

Conversion from DRAGEN QC files to legacy coverage qc format. #85

Closed holtgrewe closed 10 months ago

holtgrewe commented 10 months ago

Is your feature request related to a problem? Please describe. We currently cannot import DRAGEN QC files.

Describe the solution you'd like We need a way to convert from the DRAGEN QC TSV files to the legacy *.bam-qc.tsv format. Input should be the necessary Dragen QC files:

The output will be a TSV file with postgres-read JSON, as shown below.

The tool should write out gzip-ed data if the ending .gz is used.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context TSV format is

case_id\tset_id\tbam_stats
.\t.\t.<JSON with '"' replaced by '"""' (triple-quotes)

JSON format for one sample (just add more samples by the key)

{
  "<sample>": {
    "summary": {
      "mean coverage": 144.00357104543855,
      "total target size": 39905961
    },
    "min_cov_target": {
      "5": 99.64335992491787,
      "15": 99.33895089058005,
      "20": 99.1112561973354,
      "30": 98.35921082161876,
      "50": 94.4133189153898,
      "100": 60.479872686838185,
      "200": 14.367208699732725,
      "300": 3.3456429926754128,
      "400": 1.0507416399730682
    },
    "bamstats": {
      "sequences": 96438832,
      "reads duplicated": 5501986,
      "insert size average": 197.7,
      "insert size standard deviation": 60.1,
    }
  }
}