tleonardi / nanocompore

RNA modifications detection from Nanopore dRNA-Seq data
https://nanocompore.rna.rocks
GNU General Public License v3.0
77 stars 12 forks source link

Script for generating the "sharkfin plot of the p-values compared with Log Odds Ratio" #206

Closed sbridgett closed 1 year ago

sbridgett commented 1 year ago

Script for generating the "Sharkfin plot (Figure 4A) of the p-values compared with Log Odds Ratio"

Thank you very much for developing and publishing the nanocompore pipeline and its python wrapper module.

I've run the steps given in the Data preparation and usage, to generate the database and the plots for individual genes and regions.

I would like to generate a plot, or plots, representing the modifications found overall, and identify which genes are most modified. I think that I can extract that information from the report file produced by the "SampCompDB.save_report()" command, then plot it. I noticed the "Sharkfin plot" in Figure 4A of your paper and was wondering how to generate it. I noticed in your paper that the sharkfin plot was included in the python wrapper, but I haven't found that function. I may have overlooked it or misunderstood:

"Finally, we provide a convenient python wrapper over the GDBM database, allowing users to interactively access simple high level functions to plot and export the results (https://nanocompore.rna.rocks/demo/SampCompDB_usage/). ........ At the time of publication the wrapper allows to generate 6 different types of publication ready plots for a given transcript including ............ and (6) the sharkfin plot of the p-values compared with Log Odds Ratio (for the GMM method).

Thank you for your help with this.

rezarahman12 commented 1 year ago

Following this thread as I also wondering to generate sharkfin plot!

lmulroney commented 1 year ago

Dear sbridgett and rezarahman12,

The sharkfin plots in the paper were done in R by extracting the GMM adjusted p-values and the Logit_LOR values and labelling the points that are above the desired thresholds. This can be replicated in python by reading in the tsv file into a pendas dataframe and plotting those two columns using matplotlib or seaborn.

Logan

sbridgett commented 1 year ago

Dear Logan,

Thank you very much for explaining about how to generate the Sharkfin plots. I've used python and a pandas dataframe as you explained and then plotted using seaborn.

For the labelling threshold (and choosing transcripts to study more), I've used: GMM logit p-value <0.01 and Logit LOR score >0.5 or <−0.5, as was used in the manuscript quoted below for an adenovirus. Hopefully these thresholds are also suitable for human transcripts.

Putative m6A modified sites were identified on the basis of a GMM logit p-value (context 2) <0.01 and an Logit LOR score >0.5 or <−0.5. … -- From: "Direct RNA sequencing reveals m6A modifications on adenovirus RNA are necessary for efficient splicing", Price et al, https://www.nature.com/articles/s41467-020-19787-6

Thank you, Stephen.

lmulroney commented 1 year ago

Dear Stephen,

Those thresholds are our default thresholds for any analysis done with Nanocompore. They seem to work well for calling modified nucleotides with high specificity, but at a sacrifice to sensitivity.

You can relax those thresholds or make them more stringent depending on how much you want to balance false positive calls to false negative calls. That said, using GMM pvalue < 0.01 and abs(LOR) > 0.5 has been working for us in all systems that we have tried ourselves thus far (oligos, human, yeast, and SARS-CoV-2).

sbridgett commented 1 year ago

Dear Logan,

Thank you very for confirming that these thresholds values have worked well in all the species that you have tried. That's good. I'll also try a GMM pvalue < 0.05.

Thank you, Stephen.

sbridgett commented 1 year ago

Thank you very much for confirming these thresholds and that have worked in the species you have tried.

Stephen.

On Mon, 30 Jan 2023 at 16:51, lmulroney @.***> wrote:

Dear Stephen,

Those thresholds are our default thresholds for any analysis done with Nanocompore. They seem to work well for calling modified nucleotides with high specificity, but at a sacrifice to sensitivity.

You can relax those thresholds or make them more stringent depending on how much you want to balance false positive calls to false negative calls. That said, using GMM pvalue < 0.01 and abs(LOR) > 0.5 has been working for us in all systems that we have tried ourselves thus far (oligos, human, yeast, and SARS-CoV-2).

— Reply to this email directly, view it on GitHub https://github.com/tleonardi/nanocompore/issues/206#issuecomment-1408972673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQP74DY6YXHG6MHNFCOFA3WU7WQBANCNFSM6AAAAAARNMLMHU . You are receiving this because you authored the thread.Message ID: @.***>