simonvh / fluff

Fluff is a Python package that contains several scripts to produce pretty, publication-quality figures for next-generation sequencing experiments.
MIT License
69 stars 15 forks source link

No heatmap generated #85

Open AlexBlais74 opened 5 years ago

AlexBlais74 commented 5 years ago

Hello I am new to using fluff. I am interested in it for its ability to do dynamic pattern clustering. I have two questions.

The first question relates to the fact that the heatmap function is not generating a heatmap and that it is forwarding something to my display (and it fails at it). I have installed fluff 3.0.3 using pip and on a small test run (bed file with 100 rows and 2 bigwig files) the heatmap function does generate the expected BED file with clusters and the readcounts file. But no actual image is generated. My command is:

fluff heatmap \
-f test.bed \
-d ./bigwig_files/file1.bw \
./bigwig_files/file2.bw \
-C k -k 2 -g -M Pearson \
-o fluff_test_heatmap.pdf

I am getting this as output, in a matter of seconds:

pearson distance method
Loading data
K-means clustering
Loading data
MobaXterm X11 proxy: Authorisation not recognised
MobaXterm X11 proxy: Authorisation not recognised
Traceback (most recent call last):
  File "/home/ablai2/projects/def-ablai2/env_BIOFLUFF/bin/fluff", line 11, in <module>
    sys.exit(main())
  File "/home/ablai2/projects/def-ablai2/env_BIOFLUFF/lib/python3.7/site-packages/fluff/parse.py", line 353, in main
    heatmap(args)
  File "/home/ablai2/projects/def-ablai2/env_BIOFLUFF/lib/python3.7/site-packages/fluff/commands/heatmap.py", line 218, in heatmap
    heatmap_plot(data, ind[::-1], outfile, tracks, titles, colors, bgcolors, scale, tscale, labels, fontsize, colorbar)
  File "/home/ablai2/projects/def-ablai2/env_BIOFLUFF/lib/python3.7/site-packages/fluff/plot.py", line 86, in heatmap_plot
    fig = plt.figure(figsize=(plot_width, plot_height))
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/scipy-stack/2019a/lib/python3.7/site-packages/matplotlib/pyplot.py", line 525, in figure
    **kwargs)
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/scipy-stack/2019a/lib/python3.7/site-packages/matplotlib/backend_bases.py", line 3218, in new_figure_manager
    return cls.new_figure_manager_given_figure(num, fig)
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/scipy-stack/2019a/lib/python3.7/site-packages/matplotlib/backends/_backend_tk.py", line 1008, in new_figure_manager_given_figure
    window = Tk.Tk(className="matplotlib")
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.7.0/lib/python3.7/tkinter/__init__.py", line 2020, in __init__
    self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: couldn't connect to display "localhost:61.0"

I don't understand why X11 forwarding is even invoked, I thought the output would go straight to be written to file in the format specified by the file extension I indicated (as noted in the help for this function). I tried without specifying an extension but got the same result anyway. For the record, I do use MobaXterm with X11 forwarding, and my interactive session on the computer cluster (with salloc) has --x11 specified as well.

My second question is with running time, CPU and memory needs. I would like to run this with 5 bigwig files and a bed files of 70,000 rows. How much computing power you think I would need for this? I tried it with 1 cpu and 20 Gb of RAM, and no output has been generated yet after 6 hours.

Thanks in advance for your help.

Alex

simonvh commented 5 years ago

Hi Alex,

The X11 stuff is due to the default matplotlib configuration. Can you try the solutions mentioned here, in the section "I get ‘RuntimeError: Invalid DISPLAY variable'"? Probably the best solution is to update the matplotlib config file and change to a backend that does not use X11.

As for running time, I would not think that 70,000 rows should take such a long time, but I have to say that I'm not sure. It should certainly be possible though. Is it the clustering that is taking a long time, or reading the data? I usually run fluff with bam files instead of bigWig files, although I think this should not be that much of a performance hit.