jmonlong / Hippocamplus

A place to gather useful information I keep on forgetting.
http://jmonlong.github.io/Hippocamplus/
GNU General Public License v3.0
30 stars 19 forks source link

Question about diagMum() #7

Open ericgonzalezs opened 9 months ago

ericgonzalezs commented 9 months ago

I am using your code from here:

https://jmonlong.github.io/Hippocamplus/2017/09/19/mummerplots-with-ggplot2/

After using your function diagMum() the values of qs and qe are changing

For example, by writing only the first 6 columns, I have this before diagMum()

                         qid                    rid                rs               re               qs             qe
   1  Genome2Chr1 Genome1Chr1 104542596 105406492  54605153  54067166
   2  Genome2Chr1  Genome1Chr1  117298324 118687262  56106135  54635047
   3  Genome2Chr1  Genome1Chr1  113999489 117118453  50422976  53820315
   4  Genome2Chr1  Genome1Chr1  101165515 103895087  78958951  82110636
   5  Genome2Chr1  Genome1Chr1 138117404 141985439  28895330  24733256
   6  Genome2Chr1  Genome1Chr1   87983213 100746631  66123919  78137692
   7  Genome2Chr1  Genome1Chr1     740790  17761451 183540978 161169240
   8  Genome2Chr1  Genome1Chr1  105783481 113879997  56470341  65670731
   9  Genome2Chr1  Genome1Chr1  159358953 166868827   8021131    232233
  10 Genome2Chr1  Genome1Chr1  118756427 138083078  50343199  28930119
  11 Genome2Chr1  Genome1Chr1  142148681 159338183  24675174   8069035
  12 Genome2Chr1  Genome1Chr1   18069060  85886334 154774175  85607099

And, I have this after diagMum

                       qid                   rid                 rs               re                qs               qe
  1  Genome2Chr1 Genome1Chr1 104542596 105406492 128935825 129473812
  2  Genome2Chr1 Genome1Chr1 117298324 118687262 127434843 128905931
  3  Genome2Chr1 Genome1Chr1 113999489 117118453 133118002 129720663
  4  Genome2Chr1 Genome1Chr1 101165515 103895087 104582027 101430342
  5  Genome2Chr1 Genome1Chr1 138117404 141985439 154645648 158807722
  6  Genome2Chr1 Genome1Chr1  87983213 100746631 117417059 105403286
  7  Genome2Chr1 Genome1Chr1    740790  17761451         0  22371738
  8  Genome2Chr1 Genome1Chr1 105783481 113879997 127070637 117870247
  9  Genome2Chr1 Genome1Chr1 159358953 166868827 175519847 183308745
 10 Genome2Chr1 Genome1Chr1 118756427 138083078 133197779 154610859
 11 Genome2Chr1 Genome1Chr1 142148681 159338183 158865804 175471943
 12 Genome2Chr1 Genome1Chr1  18069060  85886334  28766803  97933879

Sorry if the answer is simple, but I don't understand what the function is doing and why the coordinates in the query are changing,

Many thanks.

jmonlong commented 9 months ago

This function was used to "flip" the alignments, basically pretending that the alignment was done on the reverse complement. In my example, it was used to go from

to

When a contig looks like it should be "flipped" (most bases aligned on - strand), the coordinates are transformed from qs to L-qs, where L is the length of the contig. For example, if an alignment block spans [80-90] of a 100bp contig, it would correspond to [10-20] of its reverse complemented sequence.

Of note, I didn't have the contig length easily accessible in that function, so I ended up using the maximum value of all the qe/qs instead. Not exactly ideal but fine for visualization purpose, esp. as those graphs don't have numbers on the y-axis.