snakemake-workflows / dna-seq-mtb

A flavor of https://github.com/snakemake-workflows/dna-seq-varlociraptor preconfigured for molecular tumor boards
MIT License
4 stars 2 forks source link

Scenario rendering does not work with given sex #15

Closed sci-kai closed 1 year ago

sci-kai commented 1 year ago

Hi,

I am trying to run the workflow for datasets of cell lines obtained with the TSO500 panel on a Nextseq 550 device. It gives an error for the varlociraptor call stating: Error: samples.tumor.sex: invalid type: sequence, expected string or singleton map at line 21 column 5.
Unfortunately, it fails in creating an appropriate scenario. The scenario file under results/scenario/HD789.yaml looks like this:

species:
  heterozygosity: 0.001
  germline-mutation-rate: 1e-3
  ploidy:
    male:
      all: 2
      X: 1
      Y: 1
    female:
      all: 2
      X: 2
      Y: 0
  genome-size: 3.5e9
expressions:
  ffpe_subst: C>T | G>A
samples:
  tumor:
    resolution: 0.01
    universe: '[0.0,1.0]'
    sex: !!python/object/apply:numpy.core.multiarray.scalar
    - &id001 !!python/object/apply:numpy.dtype
      args:
      - f8
      - false
      - true
      state: !!python/tuple
      - 3
      - <
      - null
      - null
      - null
      - -1
      - -1
      - 0
    - !!binary |
      AAAAAAAA+H8=
    contamination:
      by: normal
      fraction: .nan
  normal:
    universe: 0.0 | 0.5 | 1.0
    sex: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      AAAAAAAA+H8=
events:
  somatic_tumor_high: normal:0.0 & tumor:[0.1,1.0]
  germline: (normal:0.5 & tumor:0.5) | (normal:1.0 & tumor:1.0)
  ffpe_artifact: ($ffpe_subst) & tumor:]0.0,0.05[
  somatic_tumor_low: normal:0.0 & ((($ffpe_subst) & tumor:]0.05,0.1[) | (!($ffpe_subst)
    & tumor:]0.0,0.1[))

I suspected that the configuration of the sex is wrong, which is not explained in the documentation, but trying it with "NA", "female" or "male" always gives the same result. For this sample, I do not know the given sex as it is a mixture of cell lines.

Here are my configuration files: samples.tsv:

sample_name alias   group   platform    purity  panel   sex ffpe
HD789   tumor   HD789   ILLUMINA    NA  TSO500 male 0

units.tsv:

sample_name unit_name   fq1 fq2 sra adapters    umis
HD789   L001    HD789-DNA_S2_L001_R1_001.fastq.gz   HD789-DNA_S2_L001_R2_001.fastq.gz   NA  "-a CTGTCTCTTATACACATCT -A CTGTCTCTTATACACATCT"
HD789   L002    HD789-DNA_S2_L002_R1_001.fastq.gz   HD789-DNA_S2_L002_R2_001.fastq.gz   NA  "-a CTGTCTCTTATACACATCT -A CTGTCTCTTATACACATCT"
HD789   L003    HD789-DNA_S2_L003_R1_001.fastq.gz   HD789-DNA_S2_L003_R2_001.fastq.gz   NA  "-a CTGTCTCTTATACACATCT -A CTGTCTCTTATACACATCT"
HD789   L004    HD789-DNA_S2_L004_R1_001.fastq.gz   HD789-DNA_S2_L004_R2_001.fastq.gz   NA  "-a CTGTCTCTTATACACATCT -A CTGTCTCTTATACACATCT"

config.yaml:


samples: config/samples.tsv

units: config/units.tsv

# Optional BED file with target regions
# uncomment to use
target_regions: "/home/kai/ukd/projects/TSO500/data/TSO500_manifest_GRCh38.bed"

primers:
  trimming:
    activate: false
    # path to fasta files containg primer sequences
    primers_fa1: "path/to/primer-fa1"
    primers_fa2: "path/to/primer-fa2"
    # optional primer file allowing to define primers per sample
    # overwrites primers_fa1 and primers_fa2
    # the tsv file requires three fields: panel, fa1 and fa2 (optional)
    tsv: ""
    # Mean insert size between the outer primer ends.
    # If 0 or not set the bowtie default value of 250 will be used
    library_length: 0

calc_consensus_reads:
  # Set to true for merging PCR duplicates and overlapping reads using rbt:
  # https://github.com/rust-bio/rust-bio-tools
  activate: true

# Estimation of mutational burden.
mutational_burden:
  # Size of the sequenced coding genome for mutational burden estimation
  # Attention: when doing panel sequencing, set this to the
  # CAPTURED coding genome, not the entire one!
  coding_genome_size: 1.94e6

report:
  max_read_depth: 250
  stratify:
    # if stratification is deactivated, one tabular report for all
    # samples will be created.
    activate: false
    # select a sample sheet column for stratification, e.g. the tumorboard session
    by-column: session

params:
  varlociraptor:
    # add extra arguments for varlociraptor call
    # For example, in case of panel data consider to omit certain bias estimations
    # which might be misleading because all reads of an amplicon have the same start
    # position, strand etc. (--omit-strand-bias, --omit-read-position-bias, 
    # --omit-softclip-bias, --omit-read-orientation-bias).
    call: ""
    # Add extra arguments for varlociraptor preprocess. By default, we limit the depth to 200.
    # Increase this value for panel sequencing!
    preprocess: "--max-depth 200"
  freebayes:
    min_alternate_fraction: 0.01 # Reduce for calling variants with lower VAFs
``
johanneskoester commented 1 year ago

Thanks for reporting. I'll try to reproduce that.