parklab / bamsnap

MIT License
110 stars 24 forks source link

support for reference sequences with lowercase characters (a, c, g, t) #19

Open tkonopka opened 3 years ago

tkonopka commented 3 years ago

This addresses issue #18. It fixes problems rendering reference sequences with lowercase characters.

The fix comes in two parts. First is an expansion of the COLOR dictionary to associate hex codes to each of the lowercase bases. All the colors are defined separately. This is makes the code long, but leaves the option to tune the colors later.

The second part is an adjustment in the drawing of the coverage track. Without this edit, the coverage track signals discrepancies between bases in reads (uppercase) and the reference. The images below show a random sequence with made-up reads, before and after implementing the edit.

before

after

To reproduce, here is a random reference sequence and two reads in sam format.

>random_with_lowercase
caAagTGtgCACCAtTTCtCactaacaGGCGgtaTaAAgtgatAGTtagaaTTTgaagaT
gATTAAggcTgctGGtGtGTGaTAtcGgtcaGaTatgaTAaCagagAaCCcCTcgatgtt
aAgAGgTTtCaaCatGGgTtgaaAcTaccgtgCTtTgTtTGgtGaTTTcatttctcaGAC
aTTaCAgTcGGggtGtGaTCGgtGgtGCcgGTcATAgATAACGCtaAgTgGcGacGGtGa
gAaAAcGCtgCAattgaaaGtgGcacgacGACgCTGacCtccCaGtGTGgACgGgTaCtg
ttgcaaacGCagtTtACcgGGATaTAAaTTgGGGCcccaGAggAaAGaTgcTCCagATCg
aGTGcgGAGCtTGCGttaaGCcATTgCcaTaGGTAggcttcGAGTcaCTTCcAAcCGATT
taAcGgGtaTAGAtGtTgcTGAGgGaCaTtTCATtcCAcACTCatGcttTAGtaacCcTt
cGGCaacatCcagtAtagtTtcTgAgcTgctaCttCatAgaTTgAAcTGACCGgACaAcA
ctACTGCGTattCatgcCGCctCCCcccCaAGtCCTGacAccgggtaggtataTGTacGA
aagGgTcAcaTTtTcCgcAtcaggcgCCGCgTTATGaCGtCTTcGGAcaTCTtGcGTAgG
agtcCacaaTAagccCCtaAgCTttAcTtAGccGgagtTcAaAGAGaATGCcGCGGcGCA
AgCTgggtccTTgtttAGAaATgTGCAAATtGGAGGtTtAtCACTTTGgTtaGTcGaTaT
gTTcgGgAtTAccttTaCGctgataGGaGcaTcccgAcTccAtAtctaGGgGCcCtGgcA
CTATGaAGTCTAatAgTaGTggTTCtCaaaCGCTccgtGTccagTTGCacATCcTACTCT
CTGAcTGctGagcCcgaTGCcctGtCgtAAatGACgcaGCcAgGGGTGtcTActggGCtg
CcgaCCgacccttAcGTTtacgaacggGATctacGtTTgAcGCATcAgCAGCAAaGaTAA
CTGaATGacgCtgTCAgtcTaGTActCcAACAaAAGTtcGtGCTTtGGCCGAGAggGgcC
tAcgtgGcGAcAaCaAaCtaGAATAacATAaaaAtCaagGtCgGcgtgggtgcAtCcgtT
gTtGCGAtCttttTttAAgtGGActaCgTtgcTttCgAcaATccCGtccgAgTCCActgC
aCaTgGGAGtTgTtaGTCTtccacAagTcCaCtgctGtcTAaAcTacTAGatgAaAGTCC
ACaggacTaaatctcaAAcAAGcTACtGCCAaaCAtTggTtATGTtAgaAGttGtGGATC
CCACtGCggtaTaataggCgccCgtAGGGGCggAAcctGCtcgCtgtgGGTcTTgtaGGa
GcAcgGAcACCgaGcGAcGGcgGttTctGtgccAgCACaCgttCTtTgcgaGaTagGttC
@SQ SN:random_with_lowercase    LN:1440
@PG ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem -o example.sam large.fa example.fastq
random_with_lowercase:600-630   0   random_with_lowercase   600 60  31M *   0   0   AAAGGGTCACATTTTCCGCATCAGGCGCCGC sssssssssssssssssssssssssssssss NM:i:0  MD:Z:31 AS:i:31 XS:i:0
random_with_lowercase:620-650   0   random_with_lowercase   620 60  31M *   0   0   TCAGGCGCCGCGTTATGACGTCTTCGGACAT sssssssssssssssssssssssssssssss NM:i:0  MD:Z:31 AS:i:31 XS:i:0