Closed bioinformatica closed 7 months ago
Hi,
the axis ranges look roughly comparable to me, where do you see the 100-fold increase?
The minimum value of t>c cpm I drew is only 10^-1, but in the article it can reach 10^-3.
Did you download the original count files from the paper and repeat the plots using those tables, as sanity check?
I just tried running tsv on GSE. The following is my code and results. Can you help me figure out what's wrong?I
setwd("/home/mpiage/deseq/slamseq") count_table<-read.table("./Slamdunk_45/count/GSM2666849_47760_mESC_45min_pulse_Rep_1.tsv",sep = "\t",header=1) count_table<-count_table[count_table$TcReadCount != 0,] count_table<-count_table[count_table$ReadCount >= 100,]
TcReadCount<-count_table$TcReadCount ReadCount<-count_table$ReadCount sum(ReadCount) TcReadCount_CPM = TcReadCount/sum(ReadCount)1000000 ReadCount_CPM =ReadCount/sum(ReadCount)1000000 data <- data.frame(SteadyState_CPM = (ReadCount_CPM), TcReads_CPM = (TcReadCount_CPM),Name=count_table$Name )
p <- ggplot(data, aes(x = SteadyState_CPM, y = TcReads_CPM)) + theme_minimal()+
geom_point(alpha = 0.04) +
labs(x = "Steady state (cpm)",y = "T > C reads (cpm)") +
scale_x_log10() + # Set x-axis to log scale with base 10
scale_y_log10() + # Set y-axis to log scale with base 10
theme_minimal() #
print(p)
I see you use only one replicate instead of all three. Maybe directly use the data from Supplementary Table S1 from the paper?
Thank you so much. I want to repeat it from the beginning. How to calculate T>C and all reads cpm of 3 tsv files?
I added them up and divided them by 3, and still the cpm did not reach 10^-3
count_table_1<-read.table("./Slamdunk_45/count/GSM2666849_47760_mESC_45min_pulse_Rep_1.tsv",sep = "\t",header=1) count_table_2<-read.table("./Slamdunk_45/count/GSM2666850_47761_mESC_45min_pulse_Rep_2.tsv",sep = "\t",header=1) count_table_3<-read.table("./Slamdunk_45/count/GSM2666851_47762_mESC_45min_pulse_Rep_3.tsv",sep = "\t",header=1)
ReadCount<-(count_table_1$ReadCount +count_table_2$ReadCount + count_table_3$ReadCount)/3 TcReadCount<-(count_table_1$TcReadCount + count_table_2$TcReadCount +count_table_3$TcReadCount )/3
total<- sum(ReadCount)
ReadCount<-ReadCount[TcReadCount != 0] TcReadCount<-TcReadCount[TcReadCount != 0]
TcReadCount<-TcReadCount[ReadCount >= 100] ReadCount<-ReadCount[ReadCount >= 100]
TcReadCount_CPM = TcReadCount/total1000000 ReadCount_CPM =ReadCount/total1000000
data <- data.frame(SteadyState_CPM = (ReadCount_CPM), TcReads_CPM = (TcReadCount_CPM) )
p <- ggplot(data, aes(x = SteadyState_CPM, y = TcReads_CPM)) +
geom_point(alpha = 0.04) +
labs(x = "Steady state (cpm)",y = "T > C reads (cpm)") +
scale_x_log10() + # Set x-axis to log scale with base 10
scale_y_log10() + # Set y-axis to log scale with base 10
theme_minimal() #
print(p)
count_table<-read.table("./Slamdunk_45/count/GSM2666849_47760_mESC_45min_pulse_Rep_1.tsv",sep = "\t",header=1) ReadCount<-count_table$ReadCount TcReadCount<-count_table$TcReadCount TcReadCount_CPM_1 = TcReadCount/sum(ReadCount)1000000 ReadCount_CPM_1 =ReadCount/sum(ReadCount)1000000 index_1 <- ReadCount>100
count_table<-read.table("./Slamdunk_45/count/GSM2666850_47761_mESC_45min_pulse_Rep_2.tsv",sep = "\t",header=1) ReadCount<-count_table$ReadCount TcReadCount<-count_table$TcReadCount TcReadCount_CPM_2 = TcReadCount/sum(ReadCount)1000000 ReadCount_CPM_2 =ReadCount/sum(ReadCount)1000000 index_2 <- ReadCount>100
count_table<-read.table("./Slamdunk_45/count/GSM2666851_47762_mESC_45min_pulse_Rep_3.tsv",sep = "\t",header=1) ReadCount<-count_table$ReadCount TcReadCount<-count_table$TcReadCount TcReadCount_CPM_3 = TcReadCount/sum(ReadCount)1000000 ReadCount_CPM_3 =ReadCount/sum(ReadCount)1000000 index_3 <- ReadCount>100
ReadCount_CPM = (ReadCount_CPM_1+ReadCount_CPM_2+ReadCount_CPM_3)/3 TcReadCount_CPM = (TcReadCount_CPM_1+TcReadCount_CPM_2+TcReadCount_CPM_3)/3
ReadCount_CPM<-ReadCount_CPM[TcReadCount_CPM != 0] TcReadCount_CPM<-TcReadCount_CPM[TcReadCount_CPM != 0]
TcReadCount_CPM<-TcReadCount_CPM[ReadCount_CPM > 10^(0.5)] ReadCount_CPM<-ReadCount_CPM[ReadCount_CPM > 10^(0.5)]
data <- data.frame(SteadyState_CPM = (ReadCount_CPM), TcReads_CPM = (TcReadCount_CPM) )
p <- ggplot(data, aes(x = SteadyState_CPM, y = TcReads_CPM)) +
geom_point(alpha = 0.04) +
labs(x = "Steady state (cpm)",y = "T > C reads (cpm)") +
scale_x_log10() + # Set x-axis to log scale with base 10
scale_y_log10() + # Set y-axis to log scale with base 10
theme_minimal() #
print(p)
Did you compare the x and y axis with the ones from the supplemental table? I have the suspicion that you do not correct for no4SU treated samples as stated in there (Mean of all T>C Reads (CPM) (45min4SU - 0h 4SU)):
Thank you.
Dear Neumann:
I tried to repeat Figure 3 and calculate the CPM value of T>C, which is about 100 times larger than the figure in the paper. I use R script for tcount.tsv, the calculation formula is TcReadCount_CPM = TcReadCount/sum(ReadCount)*1000000. Is there anything wrong?
please kindly help.