Open yasirniazi opened 3 years ago
Hello,
You can download the data from my shared drive. https://drive.google.com/file/d/1DWmKMHkZtmu7S-DuPF3IRTdnWR9gfyWj/view?usp=sharing
Hope this helps.
Thank you.
On Thu, Nov 12, 2020 at 5:02 AM yasirniazi notifications@github.com wrote:
Hi dear, Hope so you are well and healthy. I just start working on it. I want to run this code for understanding complete work. for that purpose, I need the all_data.txt file required. the link given for the dataset is not understandable for me. So kindly provide a complete dataset for source code running. Thanks
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mdahasan/mClass---Multiple-cancer-classification/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3WVZFAUT3Z5Y7O3XANIA3SPPMERANCNFSM4TTIF7SA .
-- Md. Abid Hasan Ph.D. Algorithms and Computational Biology Lab Department of Computer Science and Engineering Bourns College of Engineering University of California Riverside, CA 92521
Principal Scientist I Bioinformatics Roche Sequencing Solutions, Inc. Pleasanton, CA 94588
Thanks @mdahasan . Can I please also get a copy of the gene_snp_frequency.txt
file missing in the repo? Thanks!
hi @scchess , I apologize, it's been many years, and this project isn't actively maintained.
I was looking for the file you requested, but it seems like I can't find it. (Also, this is poor python code, my early work and not the best).
However, I was looking at the code. I think the gene_snp_frequency.txt
file is a product of 1_data_preprocess.py
file. If you check this line https://github.com/mdahasan/mClass---Multiple-cancer-classification/blob/387ead12ac307b85b1a8e585ba027d681622fb6d/1_data_preprocess.py#L71 This should be the "per gene snp count". I'm not sure why this isn't stored in a file called gene_snp_frequency.txt
but maybe you can just write all_sample_cancer_snp_data
in a file name gene_snp_frequency.txt
and that should work.
Again, I apologize for the inconvenience. As I said, it's an old work from an ignorant python coder.
What about?
import sys
import pandas as pd
df = pd.read_csv(sys.argv[1], sep="\t")
sums = dict(df.sum(axis=0))
x = dict(sums)
with open("gene_snp_frequency.txt", "w") as w:
for gene in sums:
if gene != "Cancer_type":
w.write(gene + "\t" + str(sums[gene]) + "\n")
print("Generated: gene_snp_frequency.txt")
I can't say for sure if it'll work on not but seems like it should. The file gene_snp_frequency.txt
is simply just the gene name and corresponding SNP count for that gene across all samples. Should be pretty straightforward.
Thanks. Looks like the file gene_snp_frequency.txt
is working. However, running 6_feature_selection_with_mi.py
got a missing All_Class_feature_MI_down.txt
error. I'm not sure how to generate this file.
Hi dear, Hope so you are well and healthy. I just start working on it. I want to run this code for understanding complete work. for that purpose, I need the all_data.txt file required. the link given for the dataset is not understandable for me. So kindly provide a complete dataset for source code running. Thanks