Open CancerGenome opened 4 years ago
Hi,
I do not think it is wrong but it was not explained properly.
$sample_count = 0.90*(total # of samples) (for example: 90% of 110 samples is 99. Therefore, $sample_count = 99)
zcat cases_coverage.txt.gz | tail -n+2 | awk -F"," '{count=0} {for(i=4; i<1000; i++) if($i>10) count++} {if(count>$sample_count) print $1}' | awk -F":" '{print $1"\t"($2-1)"\t"$2}' | bedtools merge -i stdin > cases.dp10.bed
Hi Guo,
From your description, I found a bug which may affect all following results:
The count from your awk is actually the account of total sample size, which will be an integar number. My guess you should inlucde you total number here. For example: if(count/333).
"In analogous fashion to above, we then created a bed file for the cases containing only positions with > 90% of samples with DP > 10: zcat cases.counts.txt.gz | tail -n+2 | awk '{count=0} {for(i=4; i<1000; i++) if($i>10) count++} {if(count>0.9) print $1}' | awk -F":" '{print $1"\t"($2-1)"\t"$2}' | bedtools merge -i stdin > cases.dp10.bed"