rx32940 / Lepto-Metagenomics

3 stars 0 forks source link

CLARK-S #4

Open rx32940 opened 5 years ago

rx32940 commented 5 years ago
rx32940 commented 5 years ago

The following code should:

https://github.com/rx32940/Lepto-Metagenomics/blob/238eddc00ddfe555f8816001c330b7e503fa547a/run_clark.sh

rx32940 commented 5 years ago

the spaced database is building on sapelo2 with both prebuilt Clark database as well as the custom database.

Results for prebuilt regular database:

/scratch/rx32940/CLARK/output/prebuilt/genus(phylum/species)/regular

Results for prebuilt spaced database:

/scratch/rx32940/CLARK/output/prebuilt/genus(phylum/species)/spaced

Results for custom regular database:

/scratch/rx32940/CLARK/output/custom/genus(phylum/species)/regular

Results for custom spaced database:

/scratch/rx32940/CLARK/output/custom/genus(phylum/species)/spaced
rx32940 commented 5 years ago
rx32940 commented 5 years ago
  • [x] Phylum level analysis (regular and spaced with custom database)
  • [x] r visualization barplot fro phylum level analysis
  • [ ] genus level analysis (regular and spaced with custom database)
  • [ ] top 10 most abundant genus
  • [ ] species level analysis (regular and spaced with custom database)

Clark-s custom database results

/Users/rx32940/Dropbox/5.Rachel-projects/Metagenomic_Analysis/Clark-s/phylum/classifiedOnly_Count_clarks_phylum_custom.csv

Clark custom database results

/Users/rx32940/Dropbox/5.Rachel-projects/Metagenomic_Analysis/Clark/phylum/results/classifiedOnly_percentage_clark_phylum_custom.csv
Sample ID Classified (%) Unclassified (%)
R22.K 82.4363 17.5637
R22.L 28.7164 71.2836
R22.S 73.2934 26.7066
R26.K 77.8431 22.1569
R26.L 48.262 51.738
R26.S 72.1112 27.8888
R27.K 72.597 27.403
R27.L 31.2137 68.7863
R27.S 71.7787 28.2213
R28.K 90.76 9.24
R28.L 88.6129 11.3871
R28.S 89.0807 10.9193

dir for the table above:

/Users/rx32940/Dropbox/5.Rachel-projects/Metagenomic_Analysis/Clark-s/phylum/clarks_custom_phylum_%_classified.xlsx

code to plot graphs above

dir with the exact number of classified reads and percentage of the reads excluding the UNKNOWN reads:

/Users/rx32940/Dropbox/5.Rachel-projects/Metagenomic_Analysis/Clark-s/phylum/classifiedOnly_Count_clarks_phylum_custom.csv
/Users/rx32940/Dropbox/5.Rachel-projects/Metagenomic_Analysis/Clark-s/phylum/classifiedOnly_percentage_clarks_phylum_custom.csv
rx32940 commented 5 years ago

classification with a prebuilt database shows a very low classification level, thus will not be considered.

CLARK-s prebuilt database phylum level classification

Sample ID Classified (%) Unclassified (%)
R22.K 10.4762 89.5238
R22.L 6.2897 93.7103
R22.S 12.3613 87.6387
R26.K 12.786 87.214
R26.L 7.2184 92.7816
R26.S 8.9735 91.0265
R27.K 15.6412 84.3588
R27.L 6.2805 93.7195
R27.S 8.3282 91.6718
R28.K 6.7079 93.2921
R28.L 5.1649 94.8351
R28.S 4.5048 95.4952
rx32940 commented 5 years ago

From the percent of classified reads file, I concluded that:

dir to the classified percentage file:

/Users/rx32940/Dropbox/5.Rachel-projects/Metagenomic_Analysis/Clark-s/phylum/clarks_custom_phylum_%_classified.xlsx
rx32940 commented 5 years ago

Clark Custom Database Phylum Level Classification

Sample ID Classified (%) Unclassified (%)
R22.K 76.8498 23.1502
R22.L 25.789 74.211
R22.S 66.2754 33.7246
R26.K 73.5044 26.4956
R26.L 42.8965 57.1035
R26.S 64.2384 35.7616
R27.K 70.1686 29.8314
R27.L 27.828 72.172
R27.S 64.6651 35.3349
R28.K 80.2579 19.7421
R28.L 78.6874 21.3126
R28.S 79.2095 20.7905

clark_phylum_custom_relative clark_phylum_custom_absolute

rx32940 commented 5 years ago

Clark-s genus results with custom database

dir for the results:

/Users/rx32940/Dropbox/5.Rachel-projects/Metagenomic_Analysis/Clark-s/genus/results

Code for genus level clark-s classification r visualization clark-s_genus_custom_Relative clark-s_genus_custom_absolute

rx32940 commented 5 years ago

Clark genus results with custom database

rx32940 commented 5 years ago

Kraken2+Bracken genus level classification

https://github.com/rx32940/Lepto-Metagenomics/issues/3#issuecomment-547548115