Closed varsh1090 closed 7 years ago
Script: /ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/subset_rb1_sig_gene_mutations.py
/ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/RB1_4dataset_nonsilent.tsv
/ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/SigGenes_005_notRB1_4dataset_nonsilent.tsv
Here, we not only want the list of RB1 presence or absence, we actually want the list of other genes mutated in samples with RB1 mutation. Eg - If a sample has RB1 mutation, we want to extract all the sig genes mutated in this sample
Lets say - gene patient effect categ RB1 SBM_T37 nonsilent 4
This shows the sample SBM_T37 has a mutation in RB1. Now we want to see what are the other significant genes (Mutsig p<0.05) mutated in this sample SBM_T37 from 4dataset_nonsilent.txt and do this for each of the 179 samples with mutations in RB1.
I'm unsure what the desired output format is.
Is it an entry for each sample with a comma-delimited list of the respective significant genes? ex.
sample sig_genes
A gene1, gene2, gene3
B gene2, gene4, gene5
C gene1, gene3, gene4
...
I thought just separating it based on - Samples with (179) and without RB1 (272-179) mutation and the format that we have - 'gene patient categ effect' should work. So that they can convert it to whatever format works for them. The patient here is the sample name, using the list of samples with and without RB1 mutation, we can extract the lines from the original data for each of these samples from the separated lists.
The format you said might work too, maybe we can give them both, see what works best for them but it's essentially the same data.
Sent from my iPhone
On Mar 3, 2017, at 9:02 PM, Victor Lin notifications@github.com wrote:
I'm unsure what the desired output format is.
Is it an entry for each sample with a comma-delimited list of the respective significant genes? ex.
sample sig_genes A gene1, gene2, gene3 B gene2, gene4, gene5 C gene1, gene3, gene4 ... — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Sorry I misinterpreted your initial description. It's clear now, I think the initial format is better.
Note: I found that 156 out of 272 samples have an RB1 mutation, contrary to 179.
/ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/SigGenes_005_RB1_samples_4dataset_nonsilent.tsv
/ufrc/zhou/share/projects/bioinformatics/SCLC/sclc-scripts/results/SigGenes_005_not_RB1_samples_4dataset_nonsilent.tsv
Thanks, I will take a look and get back to you.
Just to confirm, the lists only include genes from the list of Mutsig p<0.05 genes right?
Could you please send me the path to the file you used for the sig genes? Thanks
Yes, if you look at the script, it first subsets the data for significant genes from data/SigGenes_005.txt
.
Got it, thanks