Closed fa2k closed 11 months ago
Rhocall annotate should work with the pedigree you mentioned. Are you running with a "full size" data set and are you usually able to run it for trios?
Yes it's a full size dataset, and I've analysed two trios successfully so far. Thanks for confirming, I will also dig a bit deeper then.
The rhocall annotate seems going off the rails at chr12. I can't explain exactly why, but it seems incompatible with multiple probands. Debug output:
[2023-09-26 15:03:46,581] rhocall.run_annotate_bcfroh DEBUG looking for roh window chr12 678412-730124 nmarkers 47 qual 42.600000
[2023-09-26 15:03:46,581] rhocall.run_annotate_bcfroh DEBUG Win chr chr12 not same as var chr chr13: keep drawing new vars (end 16000173).
[2023-09-26 15:03:46,581] rhocall.run_annotate_bcfroh DEBUG Win chr chr12 not same as var chr chr13: keep drawing new vars (end 16000614).
[...]
[2023-09-26 15:04:33,706] rhocall.run_annotate_bcfroh DEBUG Win chr chr12 not same as var chr chrM: keep drawing new vars (end 16309).
[2023-09-26 15:04:33,706] rhocall.run_annotate_bcfroh DEBUG Win chr chr12 not same as var chr chrM: keep drawing new vars (end 16399).
The .roh file from bcftools roh contains lines sorted by (chromosome, sample, position). Only the RG lines are considered by rhocall annotate
.
RG CASE1 chr1 ...
RG CASE1 chr1 ...
...
RG CASE2 chr1 ...
RG CASE2 chr1 ...
...
RG CASE1 chr2 ...
RG CASE1 chr2 ...
...
RG CASE2 chr2 ...
RG CASE2 chr2 ...
In the annotation job (https://github.com/dnil/rhocall/blob/master/rhocall/run_annotate_bcfroh.py), it seems to loop through the lines of this .roh
file, at the same time as iterating over the vcf.
The problem is it only iterates forward through the vcf, and doesn't return to the start of the choromosome when processing a new sample. It doesn't even use/check the sample information in the roh file.
The annotations added by rhocall annotate are INFO fields and don't seem compatible with annotation of multiple samples. At least I would expect that we want a separate AZ etc. for each sample, but since they are INFO fields, there can be one for each variant (caveat: I'm not an expert on vcf).
Do you agree with this, and if so can you think of a way to fix this? I think it would be quite complicated to split it and perform the analyses once for each affected patient, and somehow transfer the annotations to per-sample annotations.
I want to make a full disclosure that I had made a change to run with mulitple fastq files per sample. I don't think this makes a difference to my results though. I had set:
withName: '.*ANNOTATE_SNVS:BCFTOOLS_ROH' {
ext.args = { "--samples ${meta.probands.unique().join(",")} --skip-indels " }
}
When the original pipeline has
withName: '.*ANNOTATE_SNVS:BCFTOOLS_ROH' {
ext.args = { "--samples ${meta.probands.join(",")} --skip-indels " }
[...]
}
I don't think that is the issue here either. Side note: you should be able to run with multiple fastq pairs per sample out of the box. The issue you where having with that should be resolved in dev with PR #425. Let me know if it still persist.
I was taking to @dnil, the creator of rhocall annotate, and we agree that the case of family with two affected individuals is not really supported at the moment by rhocall. It's fixable but we need to find a good way of representing the data on a sample level, preferably in INFO field. What you can do for now is to use (untested):
withName: '.*ANNOTATE_SNVS:BCFTOOLS_ROH' {
ext.args = { "--samples ${meta.probands.unique().first()} --skip-indels " }
}
This should restrict the annotation to only one affected individual. I know it's less than ideal and we will work on solving this so that you will be able to get aoutzygosity calls for both affected individuals.
Thanks for the reply. I went in and manually "fixed" the .roh file by removing one sample, and the pipeline is chugging along again. I will add this config line instead for next time, it's much better.
we have a temporary solution in #445, which at least should prevent the pipeline from crashing. Hopefully we can get an update of rhocall to properly adress the issue
Any year now! 😁
Description of the bug
I get the listed error when analysing a family with two affected children, and an unaffected mother and a father.
I had to censor the sample names in the supporting files with search / replace. I hope I didn't mess up the format.
I don't know if this is supposed to work. Would also be very helpful to know if this is not a supported configuration.
Command used and terminal output
Relevant files
Relevant files - overview:
System information
Pipeline version 1.1.1.