scale tidk plot --tsv x.tsv ?

tolkit / telomeric-identifier

Identify and find telomeres, or telomeric repeats in a genome.

MIT License

111 stars 12 forks source link

scale tidk plot --tsv x.tsv ? #16

Open colindaven opened 1 year ago

colindaven commented 1 year ago

Hi,

I've been trying tidk search, explore and then plotting with tidk plot.

Data on the plots are barely visible. Perhaps a log scale would be more effective ? I'm not sure if the counts I have are just too low relative to the putative telomere counts, or if the whole graph is scaled so I can't see much.

Counts - example - from a recent public genome, fairly typical for my genomes. Telomeres have roughly counts of 1000 copies, intrachromosomal 10 to 200.

This is a summary file for the top 60 lines of a file sorted by Telomere_count, then sorted by Chr and then by Start. I'm using the full tsv for the plot step of course.

#Chr   Start     Stop      Telomere_count
Chr01  0         10000     1411
Chr01  10000     20000     777
Chr01  36960000  36970000  21
Chr01  37000000  37010000  93
Chr01  37010000  37020000  1424
Chr01  37020000  37030000  381
Chr02  0         10000     732
Chr02  38110000  38120000  357
Chr02  38120000  38130000  1264
Chr03  0         10000     1248
Chr03  21000000  21010000  11
Chr03  31730000  31740000  11
Chr03  32430000  32440000  660
Chr03  32440000  32450000  1424
Chr03  32450000  32460000  1423
Chr03  32460000  32470000  353

Euphrasiologist commented 1 year ago

Hi Colin,

Thanks for the report. Are you trying to plot from explore or search? Plot only really works for search output. Two more things:

which version of tidk are you using?
can I see the whole file? Or maybe just knowing whether this is search/explore output is enough!

Cheers, M

colindaven commented 1 year ago

Hi Max, thanks for the quick reply.

Its definitely tidk search, here's the nextflow code. I'm using the plant canonical telomere, which works well for most assemblies.

This is the latest release AFAIK.


    tidk_ubuntu_0.2.31 search --string $params.telomere --output ${prefix} --dir . --extension tsv $fasta
    tidk_ubuntu_0.2.31 plot --tsv ${prefix}_telomeric_repeat_windows.tsv
    mv tidk-plot.svg ${prefix}_plot.svg

Typical screenshot of the plot

Screenshot from 2023-03-23 09-08-51

Here example data for the first part of chr1 (here in bedgraph format).

Chr01   0       10000   1411
Chr01   10000   20000   777
Chr01   20000   30000   0
Chr01   30000   40000   0
Chr01   40000   50000   0
Chr01   50000   60000   0
Chr01   60000   70000   0
Chr01   70000   80000   0
Chr01   80000   90000   0
Chr01   90000   100000  0
Chr01   100000  110000  2
Chr01   110000  120000  7
Chr01   120000  130000  3
Chr01   130000  140000  2
Chr01   140000  150000  2
Chr01   150000  160000  1
Chr01   160000  170000  0

And the full tsv, renamed as csv for github

sample_chromosomes_telomeric_repeat_windows.csv

Euphrasiologist commented 1 year ago

Thanks for this, super helpful. I'll check the file you have given later but I am pretty sure this is the correct behaviour. I could implement a log y-scale if that would be helpful!

colindaven commented 1 year ago

Maybe, I was also thinking about just displaying more the chromosome ends in a distorted x scale, since they are what is interesting here.

Or creating a simple heatmap in python of just the chromosome ends vs selected "background" from the chromatin.