Open jkbonfield opened 6 years ago
Edit: not quite correct, but it's clearer anyway to simply rearrange it and have || nPLs < ngts
as it's more descriptive.
Do you want a PR with a few simple fixes in (I have 3 I think so far)?
See https://github.com/jkbonfield/bcftools/tree/afl_fix for my work in progress. Only a few tiny tweaks, but without these AFL has had over 200k crashes on mpileup | call so far! (I haven't had time to refuzz with the fix in place yet.)
This code assumes it is processing mpileup
output, so this case should never happen. But I am happy to accept the pull request. Thank you
I guess it's fair that people use bcftools mpileup | bcftools call
essentially as a single stage, so the call part should expect input from mpileup.
What's more of an issue is where mpileup crashes, as that's the sort of thing I could imagine a web based submission framework being open to abuse (depending on the nature of the problem). Crashing due to abort() calls isn't friendly, but there's not really anything you can exploit - and we have lots of those in htslib's vcf parser. Crashing due to reading beyond an array can rarely expose things it shouldn't, while crashing due to writing beyond an array will often permit remote code execution exploits.
Everything I've found so far though has either been in the bcftools call part or just assert(0) / abort() issues, so harmless I think. The only reason I fixed those was simply to make it easier to see any genuine crashes - I'm just spamming tests at the CIGAR-P patch basically.
I'm fuzz testing things and found a bug which I don't know how to fix.
In mcall we have:
So it explicitly checks nPLs against one of two possible values. If neither are true, then it calls
error
, but if either are true then it's fine. Given later onngts = nals*(nals+1)/2
the first check is essentiallycall->nPLs!=nsmpl*ngts
.In my data we have nals=3 and nsmpl=1, meaning the check is
nPLs != 6 && nPLs != 3
withngts = 6
.Then set_pdg does this:
So
PLs[j]
accesses from 0 to 5 (ngts-1), but we had a check on nPLs being 3 or 6. My file with 3 therefore causes it to crash.Obviously this is wrong data, but that's what the fuzzer is there to generate. My question is what should the correct check be? I'm leaning towards this:
That appears to match the case of more genotypes than the number of samples implies is possible. Is this the correct check?