jinhao94 / binning_script

A integrated binning pipeline
1 stars 0 forks source link

how to use the pipeline #1

Open chaoyanggu opened 1 year ago

chaoyanggu commented 1 year ago

hi,professor: I read your paper with the name of A high-quality genome compendium of the human gut microbiome of Inner Mongolians, and I learned a lot from it about some analysis methods. Now I want to use your pipeline to analyze my data, but the README.md is empty and I can't get some detailed informations on how to use it. And I can't understand some inputs mentioned in the pipeline. For example, SGC_hmm_path = "/mnt1/database/SGC_db/bacteria_139_CSCG.hmm", what's the SGC_hmm_path means? If I want to combined my binnings according to the GC content, abundance and K-mer in metagenomics, due to the multiple species in the samples, what is my input files, I hope you can provide some examples or some detailed user instructions. Thank you.

jinhao94 commented 1 year ago

Thank you for your support. I have uploaded the missing file, but I have a database in gtdb format, which needs to be handled by myself. The method is as follows

download the sequences and annotation file (named gtdb_taxonomy.tsv) of representative genomes of GTDB database, then predict the protein sequences, followed by making a diamond database.

You also can use followed script to generate the the script diamond_report_modified_contig.pl calling file named gtdb_taxonomy.tsv.mo less gtdbtaxonomy.tsv | perl -a -F"\t" -lne '@F[0]=~/.*?(.*.\d+)/; print "$1\t@F[1]" ' > gtdb_taxonomy.tsv.mo

chaoyanggu commented 1 year ago

Thank you for your reply, but I have another question. The HMM protein domain models data of bacteria_139_CSCG.hmm was used when you performed multiple binnings integration, but how to get the bacteria_139_CSCG.hmm, I can't find it in InterPro or pfam database. Besides, in my opinion, there are many types of bacterias in metagenomics , is the bacteria_139_CSCG.hmm represent all species?

jinhao94 commented 1 year ago

You can refer to the following references: Campbell JH, O’Donoghue P, Campbell AG, Schwientek P, Sczyrba A, Woyke T, et al. UGA is an additional glycine codon in uncultured SR1 bacteria from the human microbiota. Proc Natl Acad Sci USA. 2013;110:5540–5.

Alneberg, J., Karlsson, C.M.G., Divne, AM. et al. Genomes from uncultivated prokaryotes: a comparison of metagenome-assembled and single-amplified genomes. Microbiome 6, 173 (2018).