novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
110 stars 31 forks source link

I felt confused about the result of sample Epinanov1.2. #78

Closed solaryx closed 3 years ago

solaryx commented 3 years ago

I felt confused about the result of sample Epinanov1.2. I reversed the two input files( -k wt and -w ko ) in Epinano_DiffErr.R and got similar results. Similar results are obtained by using the model of sample training to predict the modification and I tried four ways to predict the modification. image The similar results mean ... and even the same result in delta-mis.prediction. image

Another question in Epinanov1.1 was that how the feature building step-by-step ( TSV_to_Variants_Freq.py3, Finally ) convert to the sample.csv ( $EpiNano-epinano1.1.1/examples/svm_input ). Did I miss somethings ?

Huanle commented 3 years ago

Hi @solaryx ,

The reason you still see the same results with the opposite designation of -ko and -wt is because the prediction is based on the absolute deviance between the paired samples. as you can see here the source code

The sample[12].csv are simple examples corresponding to the *.per.site.var.csv files from v1.2 but with an extra column of sample-labels. Hope this answers your question.

solaryx commented 3 years ago

Thanks for your reply.

I have two samples( -ko and -wt ), both without known modifications sites. The result of Epinano_DiffErr show the modification in wt. How can I get the profile of m6A modification in both samples ?

WHUANLEE notifications@github.com

Hi @solaryx https://github.com/solaryx ,

The reason you still see the same results with the opposite designation of -ko and -wt is because the prediction is based on the absolute deviance between the paired samples. as you can see here the source code https://github.com/enovoa/EpiNano/blob/771ad716ded2bc340edf148e02255ef2f463f494/Epinano_DiffErr.R#L140

The sample[12].csv are simple examples corresponding to the .per.site.var.csv* files from v1.2 but with an extra column of sample-labels. Hope this answers your question.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/enovoa/EpiNano/issues/78#issuecomment-760966023, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARKGGNIWTWLPJLXM6YIJG23S2BEXLANCNFSM4WCNRG2A .

solaryx commented 3 years ago

I tried to use the SVM method to predict, (command: Epinano_LabelSamples.sh -m wt.plus_strand.per_site.5mer.csv -u ko.plus_strand.per_site.5mer.csv -o combine )

and used Epinano_Predict.py to train the model. (command: Epinano_Predict.py -t combine -p combine -cl 8,13,23 -mc 26 -o train -a ).

I got 4 models(.dump) and the model obtained was used to predict both wt(ko).plus_strand.per_site.5mer.csv, (command: Epinano_Predict.py -o result -cl 8,13,23 -M .dump -p wt(ko).plus_strand.per_site.5mer.csv ).

And I also use the model ( $Epinano/models/rrach*dump ) to predict wt(ko) .plus_strand.per_site.5mer.csv.

All the SVM results are very different from Epinano_DiffErr. Shouldn't they be similar ?

Looking forward to your reply.

Wing Liv brokenmsky@gmail.com 于2021年1月21日周四 下午3:22写道:

Thanks for your reply.

I have two samples( -ko and -wt ), both without known modifications sites. The result of Epinano_DiffErr show the modification in wt. How can I get the profile of m6A modification in both samples ?

WHUANLEE notifications@github.com 于2021年1月15日周五 下午10:15写道:

Hi @solaryx https://github.com/solaryx ,

The reason you still see the same results with the opposite designation of -ko and -wt is because the prediction is based on the absolute deviance between the paired samples. as you can see here the source code https://github.com/enovoa/EpiNano/blob/771ad716ded2bc340edf148e02255ef2f463f494/Epinano_DiffErr.R#L140

The sample[12].csv are simple examples corresponding to the .per.site.var.csv* files from v1.2 but with an extra column of sample-labels. Hope this answers your question.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/enovoa/EpiNano/issues/78#issuecomment-760966023, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARKGGNIWTWLPJLXM6YIJG23S2BEXLANCNFSM4WCNRG2A .

Huanle commented 3 years ago

Thanks for your reply. I have two samples( -ko and -wt ), both without known modifications sites. The result of Epinano_DiffErr show the modification in wt. How can I get the profile of m6A modification in both samples ? WHUANLEE notifications@github.com Hi @solaryx https://github.com/solaryx , The reason you still see the same results with the opposite designation of -ko and -wt is because the prediction is based on the absolute deviance between the paired samples. as you can see here the source code https://github.com/enovoa/EpiNano/blob/771ad716ded2bc340edf148e02255ef2f463f494/Epinano_DiffErr.R#L140 The sample[12].csv are simple examples corresponding to the .per.site.var.csv* files from v1.2 but with an extra column of sample-labels. Hope this answers your question. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#78 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARKGGNIWTWLPJLXM6YIJG23S2BEXLANCNFSM4WCNRG2A .

The results simply tell what sites could be likely modified.

Huanle commented 3 years ago

I tried to use the SVM method to predict, (command: Epinano_LabelSamples.sh -m wt.plus_strand.per_site.5mer.csv -u ko.plus_strand.per_site.5mer.csv -o combine ) and used Epinano_Predict.py to train the model. (command: Epinano_Predict.py -t combine -p combine -cl 8,13,23 -mc 26 -o train -a ). I got 4 models(.dump) and the model obtained was used to predict both wt(ko).plus_strand.per_site.5mer.csv, (command: Epinano_Predict.py -o result -cl 8,13,23 -M .dump -p wt(ko).plus_strand.per_site.5mer.csv ). And I also use the model ( $Epinano/models/rrach*dump ) to predict wt(ko) .plus_strand.per_site.5mer.csv. All the SVM results are very different from Epinano_DiffErr. Shouldn't they be similar ? Looking forward to your reply. Wing Liv brokenmsky@gmail.com 于2021年1月21日周四 下午3:22写道: Thanks for your reply. I have two samples( -ko and -wt ), both without known modifications sites. The result of Epinano_DiffErr show the modification in wt. How can I get the profile of m6A modification in both samples ? WHUANLEE @.**> 于2021年1月15日周五 下午10:15写道: > Hi @solaryx https://github.com/solaryx , > > The reason you still see the same results with the opposite designation > of -ko and -wt is because the prediction is based on the absolute deviance > between the paired samples. as you can see here the source code > https://github.com/enovoa/EpiNano/blob/771ad716ded2bc340edf148e02255ef2f463f494/Epinano_DiffErr.R#L140 > > The sample[12].csv are simple examples corresponding to the > .per.site.var.csv files from v1.2 but with an extra column of > sample-labels. > Hope this answers your question. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#78 (comment)>, or > unsubscribe > https://github.com/notifications/unsubscribe-auth/ARKGGNIWTWLPJLXM6YIJG23S2BEXLANCNFSM4WCNRG2A > . >

When you did the training, did you know which sites are modified and label them accordingly? It seems to me that you just labeled all sites in ko as 'unm' and all those in wt sample as 'mod'. Are you sure this is the actual situation?

solaryx commented 3 years ago

I didn't use my data but the file in $Epinano/test_data/make_predictions. Follow the command in run.sh ( python ../../Epinano_Predict.py -o wt(ko)_Predict -M ../models/rrach.q3.mis3.del3.linear.dump -p wt(ko).plus_strand.per_site.5mer.csv -cl 8,13,23 ).

I calculate the number of 'mod' in the column prediction, indicating number of mod base is more than 1400 but the result in Epinano_DiffErr is less than 10.

Result in Epinano_DiffErr was based on the columns z_score_prediction showing mod in result.delta-mis.prediction.csv

Why they are so different?

WHUANLEE notifications@github.com 于2021年1月22日周五 上午12:01写道:

I tried to use the SVM method to predict, (command: Epinano_LabelSamples.sh -m wt.plus_strand.per_site.5mer.csv -u ko.plus_strand.per_site.5mer.csv -o combine ) and used Epinano_Predict.py to train the model. (command: Epinano_Predict.py -t combine -p combine -cl 8,13,23 -mc 26 -o train -a ). I got 4 models(

*.dump) and the model obtained was used to predict both wt(ko).plus_strand.per_site.5mer.csv, (command: Epinano_Predict.py -o result -cl 8,13,23 -M .dump -p wt(ko).plus_strand.per_site.5mer.csv ). And I also use the model ( $Epinano/models/rrachdump ) to predict wt(ko) .plus_strand.per_site.5mer.csv. All the SVM results are very different from Epinano_DiffErr. Shouldn't they be similar ? Looking forward to your reply. Wing Liv brokenmsky@gmail.com brokenmsky@gmail.com 于2021年1月21日周四 下午3:22写道: … <#m3720525894330612827> Thanks for your reply. I have two samples( -ko and -wt ), both without known modifications sites. The result of Epinano_DiffErr show the modification in wt. How can I get the profile of m6A modification in both samples ? WHUANLEE @.***> 于2021年1月15日周五 下午10:15写道: > Hi @solaryx https://github.com/solaryx https://github.com/solaryx , > > The reason you still see the same results with the opposite designation > of -ko and -wt is because the prediction is based on the absolute deviance > between the paired samples. as you can see here the source code > https://github.com/enovoa/EpiNano/blob/771ad716ded2bc340edf148e02255ef2f463f494/Epinano_DiffErr.R#L140

The sample[12].csv are simple examples corresponding to the > .per.site.var.csv* files from v1.2 but with an extra column of > sample-labels. > Hope this answers your question. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#78 (comment) https://github.com/enovoa/EpiNano/issues/78#issuecomment-760966023>, or unsubscribe > https://github.com/notifications/unsubscribe-auth/ARKGGNIWTWLPJLXM6YIJG23S2BEXLANCNFSM4WCNRG2A . >

When you did the training, did you know which sites are modified and label them accordingly? It seems to me that you just labeled all sites in ko as 'unm' and all those in wt sample as 'mod'. Are you sure this is the actual situation?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/enovoa/EpiNano/issues/78#issuecomment-764746217, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARKGGNOYRO3IX4HCBRI7ZYDS3BFWFANCNFSM4WCNRG2A .

enovoa commented 3 years ago

Hi @solaryx. The test data that you have used is pseudouridine WT and KO test data, as explained in the readme.txt: https://github.com/enovoa/EpiNano/blob/master/test_data/make_predictions/readme.txt This test data is not be used with the SVM model that predicts m6A. The test data in this folder is ribosomal RNA, not mRNA.

solaryx commented 3 years ago

How can I label my sample?

I run Epinano_LabelSamples.sh, filt the dataset to keep only RRACH and I get RRACH.csv.

Then I use Epinano_Predict -o result -cl 8,13,23 -mc 26 -t RRACH.csv -p wt(ko).csv - k linear. [image: image.png]

I don't have information about the known site because the ko sample was just treated with demethylase.

What I want to know is to see the modification in both samples by using Epinanov1.2.

If I have missed something, thanks so much for pointing that out.

WHUANLEE notifications@github.com 于2021年1月22日周五 上午12:01写道:

I tried to use the SVM method to predict, (command: Epinano_LabelSamples.sh -m wt.plus_strand.per_site.5mer.csv -u ko.plus_strand.per_site.5mer.csv -o combine ) and used Epinano_Predict.py to train the model. (command: Epinano_Predict.py -t combine -p combine -cl 8,13,23 -mc 26 -o train -a ). I got 4 models(

*.dump) and the model obtained was used to predict both wt(ko).plus_strand.per_site.5mer.csv, (command: Epinano_Predict.py -o result -cl 8,13,23 -M .dump -p wt(ko).plus_strand.per_site.5mer.csv ). And I also use the model ( $Epinano/models/rrachdump ) to predict wt(ko) .plus_strand.per_site.5mer.csv. All the SVM results are very different from EpinanoDiffErr. Shouldn't they be similar ? Looking forward to your reply. Wing Liv brokenmsky@gmail.com brokenmsky@gmail.com 于2021年1月21日周四 下午3:22写道: … <#m-2967904505702288296_> Thanks for your reply. I have two samples( -ko and -wt ), both without known modifications sites. The result of Epinano_DiffErr show the modification in wt. How can I get the profile of m6A modification in both samples ? WHUANLEE @.***> 于2021年1月15日周五 下午10:15写道: > Hi @solaryx https://github.com/solaryx https://github.com/solaryx , > > The reason you still see the same results with the opposite designation > of -ko and -wt is because the prediction is based on the absolute deviance > between the paired samples. as you can see here the source code > https://github.com/enovoa/EpiNano/blob/771ad716ded2bc340edf148e02255ef2f463f494/Epinano_DiffErr.R#L140

The sample[12].csv are simple examples corresponding to the > .per.site.var.csv* files from v1.2 but with an extra column of > sample-labels. > Hope this answers your question. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#78 (comment) https://github.com/enovoa/EpiNano/issues/78#issuecomment-760966023>, or unsubscribe > https://github.com/notifications/unsubscribe-auth/ARKGGNIWTWLPJLXM6YIJG23S2BEXLANCNFSM4WCNRG2A . >

When you did the training, did you know which sites are modified and label them accordingly? It seems to me that you just labeled all sites in ko as 'unm' and all those in wt sample as 'mod'. Are you sure this is the actual situation?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/enovoa/EpiNano/issues/78#issuecomment-764746217, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARKGGNOYRO3IX4HCBRI7ZYDS3BFWFANCNFSM4WCNRG2A .

enovoa commented 3 years ago

Hi @solaryx please open independent issues for different questions, otherwise they cannot be used to solve potentially identical doubts from others in the future - thanks!