hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
187 stars 58 forks source link

GRIPSS incorrect name matching #367

Closed micoli98 closed 1 year ago

micoli98 commented 1 year ago

I'm encountering with GRIPSS when it scans the GRIDSS VCF file looking for the tumor/reference name. Let's say I have one patient for which I have 4 samples named:

If I set gripss like:

java -jar folder/folder/gripss.jar \
        -sample  AAA \
        -reference REFERENCE \
        -ref_genome /path/GRCh38.d1.vd1.fa \
        -pon_sgl_file gridss_pon_single_breakend.bed \
        -pon_sv_file gridss_pon_breakpoint.bedpe \
        -vcf /path/REFERENCE_calls.vcf \
        -output_dir .

It turns out that in the VCF it takes the correct REFERENCE but it takes AAA_novaseq and not AAA, performing the wrong filtering. It seems that Gripss looks for the names in the GRIDSS VCF that contain the sample name I give (AAA in this case) and if there are more than one (like AAA and AAA_novaseq), it takes the last one of the list (AAA_novaseq). Is it a wanted feature or can it be changed? If instead of the last element it takes the first the issue would be solved.

charlesshale commented 1 year ago

Fixed in: https://github.com/hartwigmedical/hmftools/releases/tag/gripss-v2.3.4

Will now take an exact match first and then ignore entries containing the config ref/tumor.