mhguo1 / TRAPD

Burden testing against public controls
MIT License
50 stars 32 forks source link

Shell in the README is wrong #11

Open CancerGenome opened 4 years ago

CancerGenome commented 4 years ago

Hi Guo,

From your description, I found a bug which may affect all following results:

The count from your awk is actually the account of total sample size, which will be an integar number. My guess you should inlucde you total number here. For example: if(count/333).

"In analogous fashion to above, we then created a bed file for the cases containing only positions with > 90% of samples with DP > 10: zcat cases.counts.txt.gz | tail -n+2 | awk '{count=0} {for(i=4; i<1000; i++) if($i>10) count++} {if(count>0.9) print $1}' | awk -F":" '{print $1"\t"($2-1)"\t"$2}' | bedtools merge -i stdin > cases.dp10.bed"

misrak commented 2 years ago

Hi,

I do not think it is wrong but it was not explained properly.

$sample_count = 0.90*(total # of samples) (for example: 90% of 110 samples is 99. Therefore, $sample_count = 99)

zcat cases_coverage.txt.gz | tail -n+2 | awk -F"," '{count=0} {for(i=4; i<1000; i++) if($i>10) count++} {if(count>$sample_count) print $1}' | awk -F":" '{print $1"\t"($2-1)"\t"$2}' | bedtools merge -i stdin > cases.dp10.bed