taborrr / virus-wgs-comparison

Izabella and Tabor's qbio repo
0 stars 0 forks source link

First Checkpoint - feedback #1

Open nicmoya opened 3 weeks ago

nicmoya commented 3 weeks ago

I think your objective is clear, but I have some confusion about the execution.

You mention that you intend on visualizing your alignments with progressiveMauve. Despite the problems associated with using this tool, mentioned by me and the instructors, I don't think you can visualize the nucmer alignments using this tool. progressiveMauve is not just a visualization tool, but also an alignment tool. The visualization scripts are designed to only work with the alignment outputs of the tool itself. The nucmer outputs are not standard (it's a table with aligned segments between two genomes), and will NOT work with progressiveMauve.

Luckily the phage genomes are small, and have large blocks of synteny. These can be rather easily plotted in R using ggplot. I highly recommend you move away from progressiveMauve, as it's deprecated, difficult to use, and inter-incompatible.

I am also slightly confused about the rationale for comparing the genomes to three distinct references. Any genomic rearrangements (like shown in the all-to-one figure from your paper) will likely be uninterpretable if you subset the accessions to three groups (one for each reference). I would either pick one reference, and focus on mapping the coordinates of alignments in that one coordinate system, or use a pangenome approach if you really care about all-to-all comparisons (although this is likely to be much more challenging - https://github.com/pangenome/pggb).

One minor last thing: minimap2 is mentioned on your next steps, but not anywhere else in your goals. Is this vestigial from a previous plan?

Nonetheless, I think you meet all the criteria for this first checkpoint. I hope by next time you are able to map your accessions to a reference genome, and have began thinking about how to visualize these alignments.

nicmoya commented 3 days ago

Check-in.MD comments:

Thanks for the updates. geom_rect() is your best bet, where you use the xmin and xmax aesthetics to plot the segments from the minimap2 output, and ymin and ymax can be categorical, or an arbitrary variable you assign to each genome. Say you have 10 genomes, then your REF will be y_pos=10, and ymin=y_pos-0.5 and ymax=y_pos+0.5. The other genomes will be y_pos between 1 and 9. This will create a stack of segments, one for each genome.

Hope this helps, Nic