Closed subwaystation closed 4 months ago
The visualization is not made for 1000 haplotypes. I suggest you use your own--a simple line plot will do just fine! In fact, also panacus
is not yet optimized for such a pangenome size, but an implementation is underway...
Do you have the HEX of all these colors somewhere for me? So I can at least make it look like it came from panacus ;)
I suggest the following: Take the panacus-visualize
script, dump it in a Jupyter Notebook, re-use the functions, and change those that you want to improve on. The script is simple and easy to understand. That's at least what I am doing if I need to customize panacus output. At some point, I should provide such a notebook in the repository.
I went for R, that's why I am asking :bowtie: I see you are using a Seaborn color palette. I will get it going somehow!
There you go... const PCOLORS = ['#f77189', '#bb9832', '#50b131', '#36ada4', '#3ba3ec', '#e866f4'];
-> https://github.com/marschall-lab/panacus/blob/f2a1ca8278ac4e087acfec5ea471aff072b1fa34/etc/lib.js#L5C1-L5C84
Alright, now panacus itself seems to be overwhelmed:
RUST_LOG=info panacus histgrowth ecoli2146.pan.explode.0.og.crush.gfa -c bp -q 0,1,0.5,0.1 -t 28 > ecoli2146.pan.explode.0.og.crush.gfa.histgrowth.tsv
This results in a huge number of NaN
in the resulting TSV. Any ideas?
ecoli2146.pan.explode.0.og.crush.gfa.histgrowth.txt
The lengths of the paths vary a lot, maybe this is the problem?
I'm surprised that this doesn't work, and I suspect it's a fixable bug. @lucaparmigiani what do you think?
Thanks Simon for letting us know about the NaN!
It was indeed a bug and your graph was causing a f64 overflow!! Now the values are handled better and it is fixed. You can run it on your graph :)
Thanks @lucaparmigiani
Indeed this solved the issue, thanks!
Hi there :) I applied panacus-visualize.py to a histgrowth output of 1000 haplotypes, but the PDF is not showing any colors and some weird x-axis labels:
The TSV input is available for 10 days at https://fex.belwue.de/fop/rFYpUmCn/chr19.1000.fa.gz.gfaffix.unchop.Ygs.og.crush.gfa.histgrowth.tsv.
Next I want to try it on a data set with ~2k sequences. Thanks for any feedback :)