Open oschakoory opened 3 years ago
You may want to write your own script (e.g. Python or R) to convert your data files into the desired format. I believe you should consider normalization methods based on your need. Here are some articles that are relevant: 1) https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-018-4637-6 2) https://www.biostars.org/p/392811/
Also, you can consider using taxonomic profilers such as MetaPhlan3 (https://huttenhower.sph.harvard.edu/metaphlan/) and mOTUs2 (https://motu-tool.org/).
Thank you for your reply. I have another question about the label file. The binary value 0 (absent) 1(present) is based on what? if i have 5 controls and taxon 1 is present in 3 of 5 control, how do i determine whether it is a 0 or 1?
Thank you for your help. I would like to mention that DeepMicro is a very good algorithm and simpler to understand compared to others DL that i have used until now. Keep up the excellent job!
Presence (1) or absence (0) of a certain strain is an independent observation for each sample. It does not depend on another sample but solely indicates if a certain strain was found in a single sample. Please refer to MetaPhlan2 strain-level profiling.
I trained DeepMicro (svm) with 48 samples (control + diseased) as UserDataExample.csv. Then i used one of the diseased information (purposely) for prediction as LabelDataExample.csv
python DM.py -r 1 -cd UserDataExample.csv -cl LabelDataExample.csv -m svm
I got these informations:
Accuracy metrics
AUC, ACC, Recall, Precision, F1_score, time-end, runtime(sec), classfication time(sec), best hyper-parameter
[0.8565, 0.8929, 0.4, 1.0, 0.5714, '2021-07-19 09:26:12.410477', 0.35, 0.35, "{'C': 32, 'gamma': 0.00048828125, 'kernel': 'rbf'}"]
Can you help me identify the parameter(s) that i need to use to predict whether the LabelDataExample.csv is more likely a control or a diseased patient?
Thank you for your precious help.
Hi, I would like to use DeepMicro for disease prediction.
But i can't figure out how to generate the 'correct' input data for DeepMicro. I saw that the UserDataExample.csv has a lot of digits, where each row represents a sample and each column represents a microbe, but how did you get that table?
The datasets i have are (i) paired-end fastq files (ii) reconstructed 16S sequences (in fasta format) from the paired-end fastq files (iii) taxonomy file + abundance of each microorganisms (in csv format)
How do i convert these info into a table similar to UserDataExample.csv?
Thank you for your help