shunliubio / eTAM-seq_workflow

A workflow for eTAM-seq data processing.
GNU General Public License v3.0
4 stars 2 forks source link

Count table #1

Closed llecompte closed 1 year ago

llecompte commented 1 year ago

Hello Shun,

Thank you for developing eTAM-seq workflow. I have a question regarding the count table. Once we've obtained the pileup2var.flt.txt file with the second script 2_map_and_count.sh, how do you generate the .count.table.txt?

Best wishes, Lolita

shunliubio commented 1 year ago

Hi Lolita,

This is one example to generate the table using the awk command:

# bed file of ftom rep1
awk -F '\t' 'BEGIN {OFS="\t"} FNR>1 && ($6+$9)>=10 {m=sprintf("%.4f",$6/($6+$9)*100);print $1,$2-1,$2,$1"_"$2"_"$3"="$9"="$6"="m,m,$3}' ftom.rep1.pileup2var.flt.txt | slopBed -b 2 -i - -g hg38.chrom.sizes | fastaFromBed -s -bedOut -fi hg38.genome.fa -bed - | awk -F '\t' 'BEGIN {OFS="\t"} length($7)==5 {a=toupper($7);gsub(/T/,"U",a);if(a~/[GAU][AG]AC[ACU]/) {m="DRACH"} else {m="nonDRACH"};print $1,$2+2,$3-2,$4"="a"="m,$5,$6}' > ftom.rep1.pileup2var.flt.bed
# bed file of ftop rep1
awk -F '\t' 'BEGIN {OFS="\t"} FNR>1 && ($6+$9)>=10 {m=sprintf("%.4f",$6/($6+$9)*100);print $1,$2-1,$2,$1"_"$2"_"$3"="$9"="$6"="m,m,$3}' ftop.rep1.pileup2var.flt.txt | slopBed -b 2 -i - -g hg38.chrom.sizes | fastaFromBed -s -bedOut -fi hg38.genome.fa -bed - | awk -F '\t' 'BEGIN {OFS="\t"} length($7)==5 {a=toupper($7);gsub(/T/,"U",a);if(a~/[GAU][AG]AC[ACU]/) {m="DRACH"} else {m="nonDRACH"};print $1,$2+2,$3-2,$4"="a"="m,$5,$6}' > ftop.rep1.pileup2var.flt.bed
# bed file of ivt rep1
awk -F '\t' 'BEGIN {OFS="\t"} FNR>1 && ($6+$9)>=10 {m=sprintf("%.4f",$6/($6+$9)*100);print $1,$2-1,$2,$1"_"$2"_"$3"="$9"="$6"="m,m,$3}' ivt.rep1.pileup2var.flt.txt | slopBed -b 2 -i - -g hg38.chrom.sizes | fastaFromBed -s -bedOut -fi hg38.genome.fa -bed - | awk -F '\t' 'BEGIN {OFS="\t"} length($7)==5 {a=toupper($7);gsub(/T/,"U",a);if(a~/[GAU][AG]AC[ACU]/) {m="DRACH"} else {m="nonDRACH"};print $1,$2+2,$3-2,$4"="a"="m,$5,$6}' > ivt.rep1.pileup2var.flt.bed
# ftom vs ftop count table
intersectBed -wo -s -a ftom.rep1.pileup2var.flt.bed -b ftop.rep1.pileup2var.flt.bed | awk -F '\t' 'BEGIN {OFS="\t";print "pos","motif","type","ftom_G_count","ftom_A_count","ftop_G_count","ftop_A_count"} {split($4,a,"=");split($10,b,"=");print a[1],a[5],a[6],a[2],a[3],b[2],b[3]}' > ftom.ftop.rep1n1.pileup2var.flt.count.table.txt
# ftom vs ivt count table
intersectBed -wo -s -a ftom.rep1.pileup2var.flt.bed -b ivt.rep1.pileup2var.flt.bed | awk -F '\t' 'BEGIN {OFS="\t";print "pos","motif","type","ftom_G_count","ftom_A_count","ivt_G_count","ivt_A_count"} {split($4,a,"=");split($10,b,"=");print a[1],a[5],a[6],a[2],a[3],b[2],b[3]}' > ftom.ivt.rep1n1.pileup2var.flt.count.table.txt

You can also use any programming language you like to generate the same format of the count table.

llecompte commented 1 year ago

Thank you Shun! Also I have another question regarding mESC M3cko vs mESC Ctrl. Can I use any model, ftop or ivt, to process M3cko vs Ctrl?

Best, Lolita

shunliubio commented 1 year ago

Since we have only FTO controls for mESC M3cko, you should use the ftop model.