nf-core / raredisease

Call and score variants from WGS/WES of rare disease patients.
https://nf-co.re/raredisease
MIT License
86 stars 34 forks source link

Four individuals, two affected by phenotype, rhocall annotate problem #429

Closed fa2k closed 11 months ago

fa2k commented 1 year ago

Description of the bug

I get the listed error when analysing a family with two affected children, and an unaffected mother and a father.

I had to censor the sample names in the supporting files with search / replace. I hope I didn't mess up the format.

I don't know if this is supposed to work. Would also be very helpful to know if this is not a supported configuration.

Command used and terminal output

ERROR ~ Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_SNVS:RHOCALL_ANNOTATE (BM42)'

Caused by:
  Process `NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_SNVS:RHOCALL_ANNOTATE (BM42)` terminated with an error exit status (1)

Command executed:

  rhocall \
      annotate \
      --v14  \
       \
      -r BM42_roh.roh \
      -o BM42_rohann_rhocall.vcf \
      BM42_split_rmdup.vcf.gz

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_SNVS:RHOCALL_ANNOTATE":
      rhocall: $(echo $(rhocall --version 2>&1) | sed 's/rhocall, version //' )
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Matplotlib created a temporary config/cache directory at /ess/p164/data/no-backup/active-wgs-analyses/tmp/matplotlib-mcfb8f_l because the default path (/tsd/p164/home/p164-paalmbj/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
  2023-09-23 10:11:48,180 rhocall.cli  INFO     Running rhocall annotate  0.5.1
  [2023-09-23 10:11:48,180] rhocall.cli               INFO     Running rhocall annotate  0.5.1
  [W::bcf_hdr_register_hrec] The definition of Flag "INFO/AZ" is invalid, forcing Number=0
  [W::bcf_hdr_register_hrec] The definition of Flag "INFO/HW" is invalid, forcing Number=0
  Traceback (most recent call last):
    File "/usr/local/bin/rhocall", line 10, in <module>
      sys.exit(cli())
    File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
      return self.main(*args, **kwargs)
    File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
      rv = self.invoke(ctx)
    File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
      return ctx.invoke(self.callback, **ctx.params)
    File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
      return __callback(*args, **kwargs)
    File "/usr/local/lib/python3.9/site-packages/rhocall/cli.py", line 215, in annotate
      run_annotate_rg(
    File "/usr/local/lib/python3.9/site-packages/rhocall/run_annotate_bcfroh.py", line 117, in run_annotate_rg
      logger.debug("Win chr %s not same as var chr %s: keep drawing new vars (end %s)." % (chrom, var.CHROM, var.end))
  AttributeError: 'bool' object has no attribute 'CHROM'

Work dir:
  /ess/p164/data/no-backup/active-wgs-analyses/BM42/fd/434b2848d5bc13ef1a89bc2f2b7091

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Relevant files

Relevant files - overview:

  1. Directory listing
  2. Command script
  3. Input roh file (head)
  4. Input vcf file (head)

bash-4.2$ ls -la
total 1272645
drwxrws--x. 2 p164-paalmbj p164-member-group       4096 Sep 23 10:12 .
drwxrws--x. 3 p164-paalmbj p164-member-group       4096 Sep 23 10:11 ..
-rwxrwx--x. 1 p164-paalmbj p164-member-group 1303162795 Sep 23 10:12 BM42_rohann_rhocall.vcf
lrwxrwxrwx. 1 p164-paalmbj p164-member-group         96 Sep 23 10:11 BM42_roh.roh -> /ess/p164/data/no-backup/active-wgs-analyses/BM42/7f/84cdf87f9b8c56889fb48e62a508cd/BM42_roh.roh
lrwxrwxrwx. 1 p164-paalmbj p164-member-group        107 Sep 23 10:11 BM42_split_rmdup.vcf.gz -> /ess/p164/data/no-backup/active-wgs-analyses/BM42/61/17e8a22d487ef3fb96b3632bb3f7da/BM42_split_rmdup.vcf.gz
lrwxrwxrwx. 1 p164-paalmbj p164-member-group        111 Sep 23 10:11 BM42_split_rmdup.vcf.gz.tbi -> /ess/p164/data/no-backup/active-wgs-analyses/BM42/db/c0002bc235c88d2b249faf1efea27d/BM42_split_rmdup.vcf.gz.tbi
-rwxrwx--x. 1 p164-paalmbj p164-member-group          0 Sep 23 10:11 .command.begin
-rwxrwx--x. 1 p164-paalmbj p164-member-group       1893 Sep 23 10:12 .command.err
-rwxrwx--x. 1 p164-paalmbj p164-member-group       3432 Sep 23 10:12 .command.log
-rwxrwx--x. 1 p164-paalmbj p164-member-group          0 Sep 23 10:11 .command.out
-rwxrwx--x. 1 p164-paalmbj p164-member-group      11130 Sep 23 10:11 .command.run
-rwxrwx--x. 1 p164-paalmbj p164-member-group        346 Sep 23 10:11 .command.sh
-rwxrwx--x. 1 p164-paalmbj p164-member-group          0 Sep 23 10:11 .command.trace
-rwxrwx--x. 1 p164-paalmbj p164-member-group          1 Sep 23 10:12 .exitcode

bash-4.2$ cat .command.sh 
#!/bin/bash -euo pipefail
rhocall \
    annotate \
    --v14  \
     \
    -r BM42_roh.roh \
    -o BM42_rohann_rhocall.vcf \
    BM42_split_rmdup.vcf.gz

cat <<-END_VERSIONS > versions.yml
"NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_SNVS:RHOCALL_ANNOTATE":
    rhocall: $(echo $(rhocall --version 2>&1) | sed 's/rhocall, version //' )
END_VERSIONS

bash-4.2$ head BM42_roh.roh 
# This file was produced by: bcftools roh(1.17+htslib-1.17)
# The command line was: bcftools roh --samples CASE1,CASE2 --skip-indels --AF-file gnomad.genomes.v3.1.2.af.tab.gz -o BM42_roh.roh BM42_split_rmdup.vcf.gz
#
# RG    [2]Sample   [3]Chromosome   [4]Start    [5]End  [6]Length (bp)  [7]Number of markers    [8]Quality (average fwd-bwd phred score)
# ST    [2]Sample   [3]Chromosome   [4]Position [5]State (0:HW, 1:AZ)   [6]Quality (fwd-bwd phred score)
ST  CASE1   chr1    10623   1   3.0
ST  CASE1   chr1    10719   1   19.7
ST  CASE1   chr1    10927   1   19.6
ST  CASE1   chr1    12672   1   19.7
ST  CASE1   chr1    12719   1   19.9

bash-4.2$ gunzip -c BM42_split_rmdup.vcf.gz|head -n 220
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##GLnexusVersion=v1.4.1-0-g68e25e5
##GLnexusConfigName=DeepVariant_unfiltered
##GLnexusConfigCRC32C=3285998180
##GLnexusConfig={unifier_config: {drop_filtered: false, min_allele_copy_number: 1, min_AQ1: 0, min_AQ2: 0, min_GQ: 0, max_alleles_per_site: 32, monoallelic_sites_for_lost_alleles: true, preference: common}, genotyper_config: {revise_genotypes: false, min_assumed_allele_frequency: 9.99999975e-05, snv_prior_calibration: 1, indel_prior_calibration: 1, required_dp: 0, allow_partial_data: true, allele_dp_format: AD, ref_dp_format: MIN_DP, output_residuals: false, more_PL: true, squeeze: false, trim_uncalled_alleles: true, top_two_half_calls: false, output_format: BCF, liftover_fields: [{orig_names: [MIN_DP, DP], name: DP, description: "##FORMAT=<ID=DP,Number=1,Type=Integer,Description=\"Approximate read depth (reads with MQ=255 or with bad mates are filtered)\">", type: int, number: basic, default_type: missing, count: 1, combi_method: min, ignore_non_variants: true}, {orig_names: [AD], name: AD, description: "##FORMAT=<ID=AD,Number=R,Type=Integer,Description=\"Allelic depths for the ref and alt alleles in the order listed\">", type: int, number: alleles, default_type: zero, count: 0, combi_method: min, ignore_non_variants: false}, {orig_names: [GQ], name: GQ, description: "##FORMAT=<ID=GQ,Number=1,Type=Integer,Description=\"Genotype Quality\">", type: int, number: basic, default_type: missing, count: 1, combi_method: min, ignore_non_variants: true}, {orig_names: [PL], name: PL, description: "##FORMAT=<ID=PL,Number=G,Type=Integer,Description=\"Phred-scaled genotype Likelihoods\">", type: int, number: genotype, default_type: missing, count: 0, combi_method: missing, ignore_non_variants: true}]}}
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency estimate for each alternate allele">
##INFO=<ID=AQ,Number=A,Type=Integer,Description="Allele Quality score reflecting evidence for each alternate allele (Phred scale)">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##FILTER=<ID=MONOALLELIC,Description="Site represents one ALT allele in a region with multiple variants that could not be unified into non-overlapping multi-allelic sites">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=RNC,Number=2,Type=Character,Description="Reason for No Call in GT: . = n/a, M = Missing data, P = Partial data, I = gVCF input site is non-called, D = insufficient Depth of coverage, - = unrepresentable overlapping deletion, L = Lost/unrepresentable allele (other than deletion), U = multiple Unphased variants present, O = multiple Overlapping variants present, 1 = site is Monoallelic, no assertion about presence of REF or ALT allele">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Phred-scaled genotype Likelihoods">
##contig=<ID=chr1,length=248956422>
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr3,length=198295559>
##contig=<ID=chr4,length=190214555>
##contig=<ID=chr5,length=181538259>
##contig=<ID=chr6,length=170805979>
##contig=<ID=chr7,length=159345973>
##contig=<ID=chr8,length=145138636>
##contig=<ID=chr9,length=138394717>
##contig=<ID=chr10,length=133797422>
##contig=<ID=chr11,length=135086622>
##contig=<ID=chr12,length=133275309>
##contig=<ID=chr13,length=114364328>
##contig=<ID=chr14,length=107043718>
##contig=<ID=chr15,length=101991189>
##contig=<ID=chr16,length=90338345>
##contig=<ID=chr17,length=83257441>
##contig=<ID=chr18,length=80373285>
##contig=<ID=chr19,length=58617616>
##contig=<ID=chr20,length=64444167>
##contig=<ID=chr21,length=46709983>
##contig=<ID=chr22,length=50818468>
##contig=<ID=chrX,length=156040895>
##contig=<ID=chrY,length=57227415>
##contig=<ID=chrM,length=16569>
##contig=<ID=chr1_KI270706v1_random,length=175055>
##contig=<ID=chr1_KI270707v1_random,length=32032>
##contig=<ID=chr1_KI270708v1_random,length=127682>
##contig=<ID=chr1_KI270709v1_random,length=66860>
##contig=<ID=chr1_KI270710v1_random,length=40176>
##contig=<ID=chr1_KI270711v1_random,length=42210>
##contig=<ID=chr1_KI270712v1_random,length=176043>
##contig=<ID=chr1_KI270713v1_random,length=40745>
##contig=<ID=chr1_KI270714v1_random,length=41717>
##contig=<ID=chr2_KI270715v1_random,length=161471>
##contig=<ID=chr2_KI270716v1_random,length=153799>
##contig=<ID=chr3_GL000221v1_random,length=155397>
##contig=<ID=chr4_GL000008v2_random,length=209709>
##contig=<ID=chr5_GL000208v1_random,length=92689>
##contig=<ID=chr9_KI270717v1_random,length=40062>
##contig=<ID=chr9_KI270718v1_random,length=38054>
##contig=<ID=chr9_KI270719v1_random,length=176845>
##contig=<ID=chr9_KI270720v1_random,length=39050>
##contig=<ID=chr11_KI270721v1_random,length=100316>
##contig=<ID=chr14_GL000009v2_random,length=201709>
##contig=<ID=chr14_GL000225v1_random,length=211173>
##contig=<ID=chr14_KI270722v1_random,length=194050>
##contig=<ID=chr14_GL000194v1_random,length=191469>
##contig=<ID=chr14_KI270723v1_random,length=38115>
##contig=<ID=chr14_KI270724v1_random,length=39555>
##contig=<ID=chr14_KI270725v1_random,length=172810>
##contig=<ID=chr14_KI270726v1_random,length=43739>
##contig=<ID=chr15_KI270727v1_random,length=448248>
##contig=<ID=chr16_KI270728v1_random,length=1872759>
##contig=<ID=chr17_GL000205v2_random,length=185591>
##contig=<ID=chr17_KI270729v1_random,length=280839>
##contig=<ID=chr17_KI270730v1_random,length=112551>
##contig=<ID=chr22_KI270731v1_random,length=150754>
##contig=<ID=chr22_KI270732v1_random,length=41543>
##contig=<ID=chr22_KI270733v1_random,length=179772>
##contig=<ID=chr22_KI270734v1_random,length=165050>
##contig=<ID=chr22_KI270735v1_random,length=42811>
##contig=<ID=chr22_KI270736v1_random,length=181920>
##contig=<ID=chr22_KI270737v1_random,length=103838>
##contig=<ID=chr22_KI270738v1_random,length=99375>
##contig=<ID=chr22_KI270739v1_random,length=73985>
##contig=<ID=chrY_KI270740v1_random,length=37240>
##contig=<ID=chrUn_KI270302v1,length=2274>
##contig=<ID=chrUn_KI270304v1,length=2165>
##contig=<ID=chrUn_KI270303v1,length=1942>
##contig=<ID=chrUn_KI270305v1,length=1472>
##contig=<ID=chrUn_KI270322v1,length=21476>
##contig=<ID=chrUn_KI270320v1,length=4416>
##contig=<ID=chrUn_KI270310v1,length=1201>
##contig=<ID=chrUn_KI270316v1,length=1444>
##contig=<ID=chrUn_KI270315v1,length=2276>
##contig=<ID=chrUn_KI270312v1,length=998>
##contig=<ID=chrUn_KI270311v1,length=12399>
##contig=<ID=chrUn_KI270317v1,length=37690>
##contig=<ID=chrUn_KI270412v1,length=1179>
##contig=<ID=chrUn_KI270411v1,length=2646>
##contig=<ID=chrUn_KI270414v1,length=2489>
##contig=<ID=chrUn_KI270419v1,length=1029>
##contig=<ID=chrUn_KI270418v1,length=2145>
##contig=<ID=chrUn_KI270420v1,length=2321>
##contig=<ID=chrUn_KI270424v1,length=2140>
##contig=<ID=chrUn_KI270417v1,length=2043>
##contig=<ID=chrUn_KI270422v1,length=1445>
##contig=<ID=chrUn_KI270423v1,length=981>
##contig=<ID=chrUn_KI270425v1,length=1884>
##contig=<ID=chrUn_KI270429v1,length=1361>
##contig=<ID=chrUn_KI270442v1,length=392061>
##contig=<ID=chrUn_KI270466v1,length=1233>
##contig=<ID=chrUn_KI270465v1,length=1774>
##contig=<ID=chrUn_KI270467v1,length=3920>
##contig=<ID=chrUn_KI270435v1,length=92983>
##contig=<ID=chrUn_KI270438v1,length=112505>
##contig=<ID=chrUn_KI270468v1,length=4055>
##contig=<ID=chrUn_KI270510v1,length=2415>
##contig=<ID=chrUn_KI270509v1,length=2318>
##contig=<ID=chrUn_KI270518v1,length=2186>
##contig=<ID=chrUn_KI270508v1,length=1951>
##contig=<ID=chrUn_KI270516v1,length=1300>
##contig=<ID=chrUn_KI270512v1,length=22689>
##contig=<ID=chrUn_KI270519v1,length=138126>
##contig=<ID=chrUn_KI270522v1,length=5674>
##contig=<ID=chrUn_KI270511v1,length=8127>
##contig=<ID=chrUn_KI270515v1,length=6361>
##contig=<ID=chrUn_KI270507v1,length=5353>
##contig=<ID=chrUn_KI270517v1,length=3253>
##contig=<ID=chrUn_KI270529v1,length=1899>
##contig=<ID=chrUn_KI270528v1,length=2983>
##contig=<ID=chrUn_KI270530v1,length=2168>
##contig=<ID=chrUn_KI270539v1,length=993>
##contig=<ID=chrUn_KI270538v1,length=91309>
##contig=<ID=chrUn_KI270544v1,length=1202>
##contig=<ID=chrUn_KI270548v1,length=1599>
##contig=<ID=chrUn_KI270583v1,length=1400>
##contig=<ID=chrUn_KI270587v1,length=2969>
##contig=<ID=chrUn_KI270580v1,length=1553>
##contig=<ID=chrUn_KI270581v1,length=7046>
##contig=<ID=chrUn_KI270579v1,length=31033>
##contig=<ID=chrUn_KI270589v1,length=44474>
##contig=<ID=chrUn_KI270590v1,length=4685>
##contig=<ID=chrUn_KI270584v1,length=4513>
##contig=<ID=chrUn_KI270582v1,length=6504>
##contig=<ID=chrUn_KI270588v1,length=6158>
##contig=<ID=chrUn_KI270593v1,length=3041>
##contig=<ID=chrUn_KI270591v1,length=5796>
##contig=<ID=chrUn_KI270330v1,length=1652>
##contig=<ID=chrUn_KI270329v1,length=1040>
##contig=<ID=chrUn_KI270334v1,length=1368>
##contig=<ID=chrUn_KI270333v1,length=2699>
##contig=<ID=chrUn_KI270335v1,length=1048>
##contig=<ID=chrUn_KI270338v1,length=1428>
##contig=<ID=chrUn_KI270340v1,length=1428>
##contig=<ID=chrUn_KI270336v1,length=1026>
##contig=<ID=chrUn_KI270337v1,length=1121>
##contig=<ID=chrUn_KI270363v1,length=1803>
##contig=<ID=chrUn_KI270364v1,length=2855>
##contig=<ID=chrUn_KI270362v1,length=3530>
##contig=<ID=chrUn_KI270366v1,length=8320>
##contig=<ID=chrUn_KI270378v1,length=1048>
##contig=<ID=chrUn_KI270379v1,length=1045>
##contig=<ID=chrUn_KI270389v1,length=1298>
##contig=<ID=chrUn_KI270390v1,length=2387>
##contig=<ID=chrUn_KI270387v1,length=1537>
##contig=<ID=chrUn_KI270395v1,length=1143>
##contig=<ID=chrUn_KI270396v1,length=1880>
##contig=<ID=chrUn_KI270388v1,length=1216>
##contig=<ID=chrUn_KI270394v1,length=970>
##contig=<ID=chrUn_KI270386v1,length=1788>
##contig=<ID=chrUn_KI270391v1,length=1484>
##contig=<ID=chrUn_KI270383v1,length=1750>
##contig=<ID=chrUn_KI270393v1,length=1308>
##contig=<ID=chrUn_KI270384v1,length=1658>
##contig=<ID=chrUn_KI270392v1,length=971>
##contig=<ID=chrUn_KI270381v1,length=1930>
##contig=<ID=chrUn_KI270385v1,length=990>
##contig=<ID=chrUn_KI270382v1,length=4215>
##contig=<ID=chrUn_KI270376v1,length=1136>
##contig=<ID=chrUn_KI270374v1,length=2656>
##contig=<ID=chrUn_KI270372v1,length=1650>
##contig=<ID=chrUn_KI270373v1,length=1451>
##contig=<ID=chrUn_KI270375v1,length=2378>
##contig=<ID=chrUn_KI270371v1,length=2805>
##contig=<ID=chrUn_KI270448v1,length=7992>
##contig=<ID=chrUn_KI270521v1,length=7642>
##contig=<ID=chrUn_GL000195v1,length=182896>
##contig=<ID=chrUn_GL000219v1,length=179198>
##contig=<ID=chrUn_GL000220v1,length=161802>
##contig=<ID=chrUn_GL000224v1,length=179693>
##contig=<ID=chrUn_KI270741v1,length=157432>
##contig=<ID=chrUn_GL000226v1,length=15008>
##contig=<ID=chrUn_GL000213v1,length=164239>
##contig=<ID=chrUn_KI270743v1,length=210658>
##contig=<ID=chrUn_KI270744v1,length=168472>
##contig=<ID=chrUn_KI270745v1,length=41891>
##contig=<ID=chrUn_KI270746v1,length=66486>
##contig=<ID=chrUn_KI270747v1,length=198735>
##contig=<ID=chrUn_KI270748v1,length=93321>
##contig=<ID=chrUn_KI270749v1,length=158759>
##contig=<ID=chrUn_KI270750v1,length=148850>
##contig=<ID=chrUn_KI270751v1,length=150742>
##contig=<ID=chrUn_KI270752v1,length=27745>
##contig=<ID=chrUn_KI270753v1,length=62944>
##contig=<ID=chrUn_KI270754v1,length=40191>
##contig=<ID=chrUn_KI270755v1,length=36723>
##contig=<ID=chrUn_KI270756v1,length=79590>
##contig=<ID=chrUn_KI270757v1,length=71251>
##contig=<ID=chrUn_GL000214v1,length=137718>
##contig=<ID=chrUn_KI270742v1,length=186739>
##contig=<ID=chrUn_GL000216v2,length=176608>
##contig=<ID=chrUn_GL000218v1,length=161147>
##contig=<ID=chrEBV,length=171823>
##bcftools_normVersion=1.17+htslib-1.17
##bcftools_normCommand=norm --fasta-ref genome.fa --output BM42.vcf.gz --output-type z --multiallelics -both --threads 6 BM42.bcf; Date=Sat Sep 23 10:02:59 2023
##bcftools_normCommand=norm --fasta-ref genome.fa --output BM42_split_rmdup.vcf.gz --output-type z --rm-dup none --threads 6 BM42.vcf.gz; Date=Sat Sep 23 10:04:28 2023
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  CASE1   TrioMor TrioFar CASE2
chr1    10352   chr1_10352_T_TA T   TA  6   .   AF=0.25;AQ=6    GT:DP:AD:GQ:PL:RNC  ./.:82:47,14:6:0,9,7:II 1/1:120:65,22:5:6,4,0:..    ./.:79:39,14:10:0,15,10:II  ./.:52:32,6:5:0,8,4:II
chr1    10623   chr1_10623_T_C  T   C   19  .   AF=0.75;AQ=19   GT:DP:AD:GQ:PL:RNC  ./.:1:1,0:0:29,3,0:II   1/1:4:0,4:10:9,21,0:..  1/1:4:0,4:18:19,22,0:..1/1:2:0,2:4:4,5,0:..
chr1    10719   chr1_10719_G_C  G   C   5   .   AF=0.125;AQ=5   GT:DP:AD:GQ:PL:RNC  0/1:15:12,3:5:5,0,10:.. ./.:20:17,3:11:0,10,25:II   0/0:21:21,0:4:0,3,509:..    0/0:8:8,0:24:0,24,239:..
chr1    10787   chr1_10787_G_A  G   A   11  .   AF=0.25;AQ=11   GT:DP:AD:GQ:PL:RNC  0/1:9:6,3:9:11,0,11:..  0/0:11:11,0:33:0,33,329:..  0/1:12:9,3:9:8,0,27:..  0/0:2:2,0:6:0,9,89:..

System information

Pipeline version 1.1.1.

jemten commented 1 year ago

Rhocall annotate should work with the pedigree you mentioned. Are you running with a "full size" data set and are you usually able to run it for trios?

fa2k commented 1 year ago

Yes it's a full size dataset, and I've analysed two trios successfully so far. Thanks for confirming, I will also dig a bit deeper then.

fa2k commented 1 year ago

The rhocall annotate seems going off the rails at chr12. I can't explain exactly why, but it seems incompatible with multiple probands. Debug output:

[2023-09-26 15:03:46,581] rhocall.run_annotate_bcfroh DEBUG    looking for roh window chr12 678412-730124 nmarkers 47 qual 42.600000
[2023-09-26 15:03:46,581] rhocall.run_annotate_bcfroh DEBUG    Win chr chr12 not same as var chr chr13: keep drawing new vars (end 16000173).
[2023-09-26 15:03:46,581] rhocall.run_annotate_bcfroh DEBUG    Win chr chr12 not same as var chr chr13: keep drawing new vars (end 16000614).

[...]

[2023-09-26 15:04:33,706] rhocall.run_annotate_bcfroh DEBUG    Win chr chr12 not same as var chr chrM: keep drawing new vars (end 16309).
[2023-09-26 15:04:33,706] rhocall.run_annotate_bcfroh DEBUG    Win chr chr12 not same as var chr chrM: keep drawing new vars (end 16399).

The .roh file from bcftools roh contains lines sorted by (chromosome, sample, position). Only the RG lines are considered by rhocall annotate.

RG CASE1 chr1 ...
RG CASE1 chr1 ...
...
RG CASE2 chr1 ...
RG CASE2 chr1 ...
...
RG CASE1 chr2 ...
RG CASE1 chr2 ...
...
RG CASE2 chr2 ...
RG CASE2 chr2 ...

In the annotation job (https://github.com/dnil/rhocall/blob/master/rhocall/run_annotate_bcfroh.py), it seems to loop through the lines of this .roh file, at the same time as iterating over the vcf.

The problem is it only iterates forward through the vcf, and doesn't return to the start of the choromosome when processing a new sample. It doesn't even use/check the sample information in the roh file.

The annotations added by rhocall annotate are INFO fields and don't seem compatible with annotation of multiple samples. At least I would expect that we want a separate AZ etc. for each sample, but since they are INFO fields, there can be one for each variant (caveat: I'm not an expert on vcf).

Do you agree with this, and if so can you think of a way to fix this? I think it would be quite complicated to split it and perform the analyses once for each affected patient, and somehow transfer the annotations to per-sample annotations.

fa2k commented 1 year ago

I want to make a full disclosure that I had made a change to run with mulitple fastq files per sample. I don't think this makes a difference to my results though. I had set:

    withName: '.*ANNOTATE_SNVS:BCFTOOLS_ROH' {
        ext.args = { "--samples ${meta.probands.unique().join(",")} --skip-indels " }
    }

When the original pipeline has

    withName: '.*ANNOTATE_SNVS:BCFTOOLS_ROH' {
        ext.args = { "--samples ${meta.probands.join(",")} --skip-indels " }
        [...]
    }
jemten commented 1 year ago

I don't think that is the issue here either. Side note: you should be able to run with multiple fastq pairs per sample out of the box. The issue you where having with that should be resolved in dev with PR #425. Let me know if it still persist.

I was taking to @dnil, the creator of rhocall annotate, and we agree that the case of family with two affected individuals is not really supported at the moment by rhocall. It's fixable but we need to find a good way of representing the data on a sample level, preferably in INFO field. What you can do for now is to use (untested):

    withName: '.*ANNOTATE_SNVS:BCFTOOLS_ROH' {
        ext.args = { "--samples ${meta.probands.unique().first()} --skip-indels " }
    }

This should restrict the annotation to only one affected individual. I know it's less than ideal and we will work on solving this so that you will be able to get aoutzygosity calls for both affected individuals.

fa2k commented 1 year ago

Thanks for the reply. I went in and manually "fixed" the .roh file by removing one sample, and the pipeline is chugging along again. I will add this config line instead for next time, it's much better.

jemten commented 11 months ago

we have a temporary solution in #445, which at least should prevent the pipeline from crashing. Hopefully we can get an update of rhocall to properly adress the issue

dnil commented 11 months ago

Any year now! 😁