Comut plot for 10 genes

varsh1090 commented 7 years ago

Hi Victor, I need some help with making a comut plot for these 10 genes -

RB1 NONO 0.520 RNF8 0.604 FAM75A6 0.607 ZNF385D 0.631 POP7 0.669 MYADM 0.695 ADCY10 0.803 NGLY1 0.830 HIST2H2AB 0.925

varsh1090 commented 7 years ago

I am going to be out of town, with limited access to my laptop. Could you please make a comut plot with these 10 genes, a plot without and 1 with the numbers mentioned next to each of the genes (except RB1)? Thank you!

victorlin commented 7 years ago

All of these genes but RB1 and P53 (which is labeled as TP53) are not in the file data/SigGenes_001.txt nor data/SigGenes_005.txt. Is there another file that contains all the genes?

varsh1090 commented 7 years ago

They are from the 4datasetnonsilent file, gene column. Please make sure you count each gene:patient combination only once.

Sent from my iPhone

On Apr 6, 2017, at 11:03 AM, Victor Lin notifications@github.com wrote:

All of these genes but RB1 and P53 (which is labeled as TP53) are not in the file data/SigGenes_001.txt nor data/SigGenes_005.txt. Is there another file that contains all the genes?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

victorlin commented 7 years ago

Previously the genes were sorted based on p-value from data/SigGenes_*.txt. Should it just use a pre-defined order so that p-value will not be needed?

I am using this as an input file:

$ cat genes.txt
P53
RB1
NONO    0.520
RNF8    0.604
FAM75A6 0.607
ZNF385D 0.631
POP7    0.669
MYADM   0.695
ADCY10  0.803
NGLY1   0.830
HIST2H2AB   0.925

varsh1090 commented 7 years ago

The order can be the frequency of genes? Or the order from the number I sent you, next to the genes, with P53 and RB1 at the top. Sorry I am traveling, might respond late.

Thanks, Varsha

Sent from my iPhone

On Apr 6, 2017, at 1:32 PM, Victor Lin notifications@github.com wrote:

Previously the genes were sorted based on p-value from data/SigGenes_*.txt. Should it just use a pre-defined order so that p-value will not be needed?

I am using this as an input file:

$ cat genes.txt P53 RB1 NONO 0.520 RNF8 0.604 FAM75A6 0.607 ZNF385D 0.631 POP7 0.669 MYADM 0.695 ADCY10 0.803 NGLY1 0.830 HIST2H2AB 0.925 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

leizhou69 commented 7 years ago

Agree with Varsha,

could either based on frequency or the value (from highest 0.925 to the lowest 0.520).

On 4/6/17 1:59 PM, varsh1090 wrote:

The order can be the frequency of genes? Or the order from the number I sent you, next to the genes, with P53 and RB1 at the top. Sorry I am traveling, might respond late.

Thanks, Varsha

Sent from my iPhone

On Apr 6, 2017, at 1:32 PM, Victor Lin notifications@github.com wrote:

Previously the genes were sorted based on p-value from data/SigGenes_*.txt. Should it just use a pre-defined order so that p-value will not be needed?

I am using this as an input file:

$ cat genes.txt P53 RB1 NONO 0.520 RNF8 0.604 FAM75A6 0.607 ZNF385D 0.631 POP7 0.669 MYADM 0.695 ADCY10 0.803 NGLY1 0.830 HIST2H2AB 0.925 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/victor-lin/sclc-scripts/issues/11#issuecomment-292256972, or mute the thread https://github.com/notifications/unsubscribe-auth/AShJOPY-KvHeh99sG5Cl11vUGNDPyiwXks5rtSf6gaJpZM4Mu37Q.

--

Lei Zhou (B.Med. Ph.D.) Associate Professor Department of Molecular Genetics and Microbiology College of Medicine Member, UF Health Cancer Center & UF Genetics Institute University of Florida PO Box 103633

For FedEx delivery: 2033 Mowry Road CGRC-285G (Zhou Lab) Gainesville, Florida 32610-3633

victorlin commented 7 years ago

The plot is generated. Here are the relevant files:

Base directory: /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/

Gene list file: data/gene-lists/genes.txt
Comutation plot: results/SCLC_comut_plot_040617.pdf
Sample IDs: results/SampleIDs_040617.txt

Also, @varsh1090 you mentioned to count each gene:patient combination only once. They are being counted once, however currently it is taking the very last mutation type encountered in the dataset file, regardless of the previous ones.

For example, if this was part of the file:

TP53  Sample1 4
RB1   Sample1 3
TP53  Sample1 6

The only mutation type information stored for (TP53, Sample1) would be 6. Is this the desired behavior?

varsh1090 commented 7 years ago

Thanks Victor, I'll take a look when I get a chance. I think for this figure, we might not need to mention the mutation type info.

Sent from my iPhone

On Apr 6, 2017, at 3:54 PM, Victor Lin notifications@github.com wrote:

The plot is generated. Here are the relevant files:

Base directory: /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/

Gene list file: data/gene-lists/genes.txt Comutation plot: results/SCLC_comut_plot_040617.pdf Sample IDs: results/SampleIDs_040617.txt Also, @varsh1090 you mentioned to count each gene:patient combination only once. They are being counted once, however currently it is taking the very last mutation type encountered in the dataset file, regardless of the previous ones.

For example, if this was part of the file:

TP53 Sample1 4 RB1 Sample1 3 TP53 Sample1 6 The only mutation type information stored for (TP53, Sample1) would be 6. Is this the desired behavior?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

victorlin commented 7 years ago

Sounds good. The mutation type concern also applies to all the previous comutation plots that were generated by this script.

varsh1090 commented 7 years ago

Can we also make the plot without the mutation types colored? For this figure and also the figure with the top 8 most significant genes? Thanks

victorlin commented 7 years ago

Generated the plots. The files are in /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/comutation-plot_041117/.

You can see the commands I used to generate the plot in the file comutations/examples.txt.

varsh1090 commented 7 years ago

Thanks! Can we also make 1 for p<0.001 genes? The top 8 in the list.

victorlin commented 7 years ago

Here is a list of all the most recent plots and sample IDs:

custom list of 10 genes: /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/comutation-plot_041117
15 genes (p < 0.001): /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/comutation-plot_041117
8 genes: /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/comutation-plot_041717
56 genes (p < 0.01): /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/comutation-plot_041917

varsh1090 commented 7 years ago

Thanks Victor!

zhoulab / sclc-scripts

Comut plot for 10 genes #11

--