sipbs-compbiol / BM327-microbiology

Computing support for BM327 microbiology
https://sipbs-compbiol.github.io/BM327-microbiology/
0 stars 0 forks source link

design genome analysis workflow #3

Open mafeeney opened 2 years ago

mafeeney commented 2 years ago

something, MUMmer, bedtools, primer design

widdowquinn commented 2 years ago

Assuming we start with a set of genomes that can be divided into pathogens and non-pathogens, we need to:

  1. divide the genomes into the correct groups (e.g. using MLST/other markers, maybe presence/absence of effectors/toxins) in galaxy
  2. perform pairwise genome comparisons of pathogens against each other, and pathogens against non-pathogens (or even a single pathogen genome against all non-pathogen genomes, because of the set arithmetic), with mummer in galaxy
  3. use BEDtools or similar to identify regions common to all pathogens (intersection of regions aligning to a reference pathogen genome, common to all other pathogen genomes) in galaxy
  4. use BEDtools or similar to identify regions common to all pathogens, but also present in at least one non-pathogen (these will be discarded as they are not diagnostic of the pathogens) in galaxy
  5. use a primer design tool to design primers to the reference pathogen genome, and keep only those that amplify a region unique to/diagnostic of pathogens (galaxy)
  6. test the designed primers in silico to ensure they amplify all the known pathogen genomes (galaxy)
  7. test the designed primers in silico to ensure they do not amplify any known non-pathogens (galaxy)

The remaining primer sets after this process are candidate diagnostic primers that positively amplify pathogens, but not non-pathogens. We can then…

  1. test the candidate primers against the RefSeq genome database at NCBI to ensure there is no wider off-target amplification (NCBI)

The last step might be a stretch goal.