schneebergerlab / plotsr

Tool to plot synteny and structural rearrangements between genomes
MIT License
282 stars 28 forks source link

possible bug for tracks: all chromosomes need to be in the track list #53

Closed KewinOgink closed 1 year ago

KewinOgink commented 1 year ago

Hi, thanks for your great tool. I noticed the following when trying to load 2 tracks via 2 bed files:

region_to_mark_ref.bed
chr2    0   10000

region_to_mark_sample.bed
chr1    0   10000
chr2    0   10000

When I plot chr1 and chr2 without tracks it goes fine.

When I separately plot chr1 with only chr1 regions

region_to_mark_sample.bed
chr1    0   10000

or chr2 with only chr2 regions tracks, it works

region_to_mark_ref.bed
chr2    0   10000

region_to_mark_sample.bed
chr2    0   10000

But when I try to run the track for chr1 and chr2, I get this:

Traceback (most recent call last):
  File "/tools/eb/software/plotsr/0.5.3-GCCcore-10.2.0/bin/plotsr", line 6, in <module>
    main(sys.argv[1:])
  File "/tools/eb/software/plotsr/0.5.3-GCCcore-10.2.0/lib/python3.9/site-packages/plotsr/main.py", line 55, in main
    plotsr(args)
  File "/tools/eb/software/plotsr/0.5.3-GCCcore-10.2.0/lib/python3.9/site-packages/plotsr/plotsr.py", line 259, in plotsr
    ax = drawtracks(ax, tracks, S, chrgrps, chrlengths, V, ITX, cfg, minl=minl, maxl=maxl)
  File "/tools/eb/software/plotsr/0.5.3-GCCcore-10.2.0/lib/python3.9/site-packages/plotsr/func.py", line 1595, in drawtracks
    tposmax = max(tpos)
ValueError: max() arg is an empty sequence

I looked at this issue https://github.com/schneebergerlab/plotsr/issues/28 as I have the same error code, but double checked and my file /chromosome names seem correct.

I was able to fix this by adding a chr1 row for region_to_mark_ref.bed:

region_to_mark_ref.bed
chr1    0   1
chr2    0   10000
mnshgl0110 commented 1 year ago

Hi. Thanks for testing this. You are correct, plotsr does expect that tracks (when used) would have information for all chromosomes. And your strategy of adding chr1:0-1 in bed file is indeed the correct approach to overcome this.

I think there is some merit in this setting as tracks are "intended" for showing distribution along the genome. For annotating specific regions, the suggested way is to use --markers. Have you tried using that? Are there any specific reasons why tracks are better compared to markers for this?

Nevertheless, the error message should be better and explain this. I will add them.

KewinOgink commented 1 year ago

Hi. I tried markers too, but I like to visualize regions of interest as blocks rather than just a dot in the middle. Ideally I could have this both for ref and the query sample but I read that that's not possible because plotsr is made to work for more than two https://github.com/schneebergerlab/plotsr/issues/42.

By the way is it possible to have coordinate ticks in itx mode? I guess this is unfortunately for the same reason not possible for both ref and query(s) (it would be great to have both tracks and ticks for ref and query), but is it possible for only the ref perhaps?

And while I'm at it: in itx mode I only see chromsome labels for the ref side, but not on the query side. Because the chromosomes are not aligned horizontally, it is sometimes a bit difficult to see which chromosome is which (See pic). Could they be aligned?

bug
mnshgl0110 commented 1 year ago

By the way is it possible to have coordinate ticks in itx mode? I guess this is unfortunately for the same reason not possible for both ref and query(s) (it would be great to have both tracks and ticks for ref and query), but is it possible for only the ref perhaps?

Currently, it is not possible to get coordinates in ITX mode because (as you guessed) the coordinates would not be conserved across the genomes. One idea would be to align the start of homologous chromosomes and then have coordinates. I will try to add it up.

And while I'm at it: in itx mode I only see chromsome labels for the ref side, but not on the query side. Because the chromosomes are not aligned horizontally, it is sometimes a bit difficult to see which chromosome is which (See pic). Could they be aligned?

This is indeed sub-optimal. This is arising because the chromosomes are of quite different sizes, and when they are stacked the size differences stack. I have added it in the todo list. Thanks for reporting this.

KewinOgink commented 1 year ago

I think there is some merit in this setting as tracks are "intended" for showing distribution along the genome. For annotating specific regions, the suggested way is to use --markers. Have you tried using that? Are there any specific reasons why tracks are better compared to markers for this?

There is no other way to show regions of interest and their size right? I'm trying to visualize some NOTAL regions and im interested in their location, but also their size. As I understand, with 'marker' only the location can be given but not the size. But it seems like if there are many regions to plot, there still is a frequency distribution seen, both on the whole genome scale image and when I select only that chromosome (so these 2 pictures are the same data): image

mnshgl0110 commented 1 year ago

It is possible to show location and size using --markers. Something like this would draw a line over the chromosome: Chr1 1 10000000 ler mt:_;mc:black;ms:1;tt:contig;tp:0.02;ts:8;tf:Arial;tc:black

For vertical chromosomes, you might need to use "|" (check the different available markers here: https://github.com/schneebergerlab/plotsr/blob/master/config/marker_point_type.txt)

KewinOgink commented 1 year ago

What I mean is to show the actual length as in the bed coordinates (i.e. 10Mb in your example)

if I have ms10 I get this image and with ms1 I get image

so the length is the marker size not the bed region

mnshgl0110 commented 1 year ago

Can you please share your markers.txt files?

KewinOgink commented 1 year ago

Sorry for the late reply! my markers file used for a a plot like this is

#chr    start   end genome_id   tags
chr2    0   28000000    ref mt:_;mc:red;ms:10;tt:28Mb ROI;tp:0.02;ts:8;tf:DejaVu Sans;tc:black
chr1    0   5732065 sample1 mt:_;mc:red;ms:10;tt:5.7Mb ROI;tp:0.02;ts:8;tf:DejaVu Sans;tc:black
chr2    0   15998100    sample1 mt:_;mc:red;ms:10;tt:16Mb ROI;tp:0.02;ts:8;tf:DejaVu Sans;tc:black

This gives this picture (the dashed lines is the border between the 2 chroms which is not very clear because the marker. As you can see, the markers are not 28M, 5.7M, or 16Mb, but just have size of ms:10. In yellow, I marked the approximate desired sizes that the markers should have. Hope it is clear! image

command used:

plotsr \
  --sr syri.out \
  --genomes ref_and_sample_fai.txt \
  --chr chr1 --chr chr2 \
  --markers markers.bed \
  --itx \
  -S 0.7 -o R7 -W 7 -H 10 -f 8 -s 1000000\
  -o plot_out.png
mnshgl0110 commented 1 year ago

Hi @KewinOgink. I tested plotsr in this context and it seems to be working fine. My command is:

plotsr --itx --sr col_lersyri.filtered.out --genomes tmp_genomes.txt -S 1 -o tmp.png -W 7 -H 4 -f 8 --markers tmp_markers.bed --chr Chr1 --chr Chr3

genomes.txt is:

TAIR10_Filtered.chrlen  col-0   ft:cl;lw:1.5
ler.chrlen  ler ft:cl;lw:1.5

and markers.bed is:

#chr    start   end genome_id   tags
Chr1    0   5000000 col-0   mt:_;mc:red;ms:10;tt:Marker1;tp:0.02;ts:8;tf:Tinos;tc:black
Chr3    0   10000000    ler mt:_;mc:black;ms:1;tt:Marker2;tp:0.02;ts:8;tf:Tinos;tc:black

image

So, it is difficult for me to predict what exactly could be happening. I suggest that you try changing the example files (https://github.com/schneebergerlab/plotsr/blob/master/example/). If they work, then probably the issue would be in the plotsr input files.

KewinOgink commented 1 year ago

Thanks again for your quick and clear responses. I was not able to reproduce your example and it turns out I was not on the latest version of plotsr - will update and try again. Pretty sure this will solve the problem...

KewinOgink commented 1 year ago

Hi, updating plotsr fixed the problem - thanks for your support!