lh3 / bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
GNU General Public License v3.0
1.55k stars 556 forks source link

bwakit/run-gen-ref not checking NCBI checksums #257

Open coreymhudson opened 5 years ago

coreymhudson commented 5 years ago

BWA indices are being transferred to the user over FTP. The significance of this was reported in CVE-2019-10269. Because of the difficulty in producing these indices by the user, bwa.kit downloads them directly from NCBI from

url38="ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_analysis_set.fna.gz"

run-gen-ref executes this url here:

if [ $1 == "hs38DH" ]; then (wget -O- $url38 | gzip -dc; cat $root/resource-GRCh38/hs38DH-extra.fa) > $1.fa [ ! -f $1.fa.alt ] && cp $root/resource-GRCh38/hs38DH.fa.alt $1.fa.alt elif [ $1 == "hs38a" ]; then wget -O- $url38 | gzip -dc > $1.fa [ ! -f $1.fa.alt ] && grep _alt $root/resource-GRCh38/hs38DH.fa.alt > $1.fa.alt elif [ $1 == "hs38" ]; then wget -O- $url38 | gzip -dc | awk '/^>/{f=/_alt/?0:1}f' > $1.fa elif [ $1 == "hs37d5" ]; then wget -O- $url37d5 | gzip -dc > $1.fa 2>/dev/null elif [ $1 == "hs37" ]; then wget -O- $url37d5 | gzip -dc 2>/dev/null | awk '/^>/{f=/>hs37d5/?0:1}f' > $1.fa else echo "ERROR: unknown genome build" fi

Without checksums there is no guarantee that the indices are correctly delivered. These checksums are provided by NCBI here: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/md5checksums.txt

There should be checks for these indices coded into run-gen-ref or the software should notify the user that the indices have not been checked.