nickjcroucher / gubbins

Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
http://nickjcroucher.github.io/gubbins/
GNU General Public License v2.0
174 stars 50 forks source link

Output not just polymorphic sites: #249

Closed EisenRa closed 10 months ago

EisenRa commented 5 years ago

Hiya,

I'm wondering if it's possible to output the whole filtered alignment (including invariant sites). I want to run BEAST after Gubbins, which has issues with only using polymorphic sites.

Cheers, Raphael

tseemann commented 5 years ago

@EisenRa giving BEAST all the invariant sites will slow it down unnecessarily. i think the right approach is to include the constant site numbers in the BEAST XML file. This tool might be helpul: https://github.com/andersgs/beast2_constsites

EisenRa commented 5 years ago

@tseemann thanks for sending that through, I'll check it out!

EisenRa commented 4 years ago

@tseemann I've given beast2_consites a spin, and it works great!

However, in order to use it with Gubbins, I may need another approach. I noticed that the .gff file outputted by Gubbins contains start/end positions of the putatively recombinant sites. Does it seem reasonable to create a .bed file with these coordinates to feed into snippy-core?

tseemann commented 4 years ago

if you wanted to mask the recombinant regions that gubbins (or clonalframeml) computer then yes passing them to snippy-core --mask FILE.BED is the best solution. you should never mask recombination then use that as a reference, always do it at the end.

It is possible that paftools gff2bed might work. Maybe i can support GFF too in Snippy core.

You may also be interested in the new snp-sites -C option.

EisenRa commented 4 years ago

Thanks for the advice Torsten. For other people who may be interested, I ended up going with this approach:

tseemann commented 4 years ago

You can't run gubbins on a single genome - it needs a full genome alignment of all your isolates. It has to be done after the snippy-core step. You can use a tree generared from the core.aln as the starting tree for gubbins.

  1. run snippy on each isolate (or use snippy-multi)
  2. run snippy-clean_full_aln on core.full.aln
  3. give that to gubbins
  4. extract your regions to BED 4b. consider masking phage and plasmids too
  5. give them to snippy-core
  6. build a tree

I will ask @danielleingle to confirm this

Do you need BEAST2? Consider http://www.atgc-montpellier.fr/LSD/ ? See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6006949

EisenRa commented 4 years ago

Yes, whoops, you're correct!

4b. consider masking phage and plasmids too

I'm using PHASTER for this, which seems to be working well.

tseemann commented 4 years ago

Yep, it's good if you can get it to work when you have > 10 contigs.

fengyuchengdu commented 4 years ago

about masking phage using predictions from PHASTER, should I mask the intact phage only or mask all phage regions detected by PHASTER even they are not intact (e.g. regions only contain 20% of an intact phage, highlighted in red in PHASTER results), if I'm interested in the phylogeny.

harlesscw commented 3 years ago

You can't run gubbins on a single genome - it needs a full genome alignment of all your isolates. It has to be done after the snippy-core step. You can use a tree generared from the core.aln as the starting tree for gubbins.

1. run snippy on each isolate (or use snippy-multi)

2. run snippy-clean_full_aln on core.full.aln

3. give that to gubbins

4. extract your regions to BED
   4b. consider masking phage and plasmids too

5. give them to snippy-core

6. build a tree

I will ask @danielleingle to confirm this

Do you need BEAST2? Consider http://www.atgc-montpellier.fr/LSD/ ? See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6006949

@tseemann did you ever hear back from @danielleingle? We (https://gitlab.com/bcorey and the MRSN) are polishing up our pipeline for variant analysis and would love some input on the best order of operations to produce an alignment for beast2_constsites -> BEAST2