Genes mutated in samples with RB1 gene mutation

varsh1090 commented 7 years ago

Using the mutation file - 4dataset_nonsilent.txt, extract lines containing significant genes (list of Mutsig p<0.05 genes)
Separate this list into - samples with and without mutation in RB1 gene (matching 'RB1' in column 1)
Output files - a. Gene Patient Categ effect - Samples with RB1 mutated b. Gene Patient Categ effect - Samples without RB1 mutated

victorlin commented 7 years ago

Script: /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/subset_rb1_sig_gene_mutations.py

RB1 mutations: /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/RB1_4dataset_nonsilent.tsv
non-RB1 significant gene mutations: /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/SigGenes_005_notRB1_4dataset_nonsilent.tsv

varsh1090 commented 7 years ago

Here, we not only want the list of RB1 presence or absence, we actually want the list of other genes mutated in samples with RB1 mutation. Eg - If a sample has RB1 mutation, we want to extract all the sig genes mutated in this sample

varsh1090 commented 7 years ago

Lets say - gene patient effect categ RB1 SBM_T37 nonsilent 4

This shows the sample SBM_T37 has a mutation in RB1. Now we want to see what are the other significant genes (Mutsig p<0.05) mutated in this sample SBM_T37 from 4dataset_nonsilent.txt and do this for each of the 179 samples with mutations in RB1.

victorlin commented 7 years ago

I'm unsure what the desired output format is.

Is it an entry for each sample with a comma-delimited list of the respective significant genes? ex.

sample  sig_genes
A       gene1, gene2, gene3
B       gene2, gene4, gene5
C       gene1, gene3, gene4
...

varsh1090 commented 7 years ago

I thought just separating it based on - Samples with (179) and without RB1 (272-179) mutation and the format that we have - 'gene patient categ effect' should work. So that they can convert it to whatever format works for them. The patient here is the sample name, using the list of samples with and without RB1 mutation, we can extract the lines from the original data for each of these samples from the separated lists.

The format you said might work too, maybe we can give them both, see what works best for them but it's essentially the same data.

Sent from my iPhone

On Mar 3, 2017, at 9:02 PM, Victor Lin notifications@github.com wrote:

I'm unsure what the desired output format is.

Is it an entry for each sample with a comma-delimited list of the respective significant genes? ex.

sample sig_genes A gene1, gene2, gene3 B gene2, gene4, gene5 C gene1, gene3, gene4 ... — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

victorlin commented 7 years ago

Sorry I misinterpreted your initial description. It's clear now, I think the initial format is better.

Note: I found that 156 out of 272 samples have an RB1 mutation, contrary to 179.

Significant mutations from RB1-mutated samples: /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/SigGenes_005_RB1_samples_4dataset_nonsilent.tsv
Significant mutations from RB1-wild samples: /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/SigGenes_005_not_RB1_samples_4dataset_nonsilent.tsv

varsh1090 commented 7 years ago

Thanks, I will take a look and get back to you.

varsh1090 commented 7 years ago

Just to confirm, the lists only include genes from the list of Mutsig p<0.05 genes right?

varsh1090 commented 7 years ago

Could you please send me the path to the file you used for the sig genes? Thanks

victorlin commented 7 years ago

Yes, if you look at the script, it first subsets the data for significant genes from data/SigGenes_005.txt.

varsh1090 commented 7 years ago

Got it, thanks

zhoulab / sclc-scripts

Genes mutated in samples with RB1 gene mutation #8