molbio-dresden / flexidot

Highly customizable, ambiguity-aware dotplots for visual sequence analyses
GNU Lesser General Public License v2.1
90 stars 16 forks source link

Use blast output as input? #5

Closed oushujun closed 6 years ago

oushujun commented 6 years ago

Hey, great tool! Looks very promising to combine it with RepeatMasker gff3 annotation, and I love the way of installation - so easy! Every python program should do this!

So I tried it on a 44K-sequence region with heavily nested TE insertions. It runs quite slow on my laptop with 4 threads. Immediately I am thinking: can the program take BLAST outputs (i.e., -outfmt=6) as inputs, which may be faster for the alignment procedure?

Other thought: I tried it on our CentOs 7 server, the program seems to require a visualization terminal.

self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use) _tkinter.TclError: no display name and no $DISPLAY environment variable

Can it be run in a server and only generate png/pdf files for laptop visualization?

Thanks again! Shujun

molbio-dresden commented 6 years ago

Hi @oushujun, thank you for your appreciation and your input!

The idea to take BLAST results as the basis for a dotplot is a great idea and definitely interesting for us. The output format -outfmt=6 contains all necessary positional information. So we will give it a try when we work on the next update. You have to keep in mind, that the BLAST results give a rougher estimation of shared regions. More importantly, the chosen settings will affect the resulting hit table and thus the downstream dotplot visualization.

We did not yet approach parallelization of the code as we'd like to stick to the cross platform applicability. Of course, this would speed up FlexiDot. However, we didn't find a suitable Python library for parallelization independent from the operating system. If you have any suggestion, please let us know.

Regarding your issue with the CentOs server installation: We suspect that your error is thrown by the Matplotlib module. We observed similar messages during developmental stages which disappeared after updating this module. Maybe it is worth a try to update Matplotlib and/or tkinter.

Is it possible for you to specify your last question?

 Can it be run in a server and only generate png/pdf files for laptop visualization?

Thank you for your feedback and we hope to hear from you.

Best wishes from Dresden, Tony & Kathrin

oushujun commented 6 years ago

Hi Tony & Kathrin,

Thank you for your reply. Note that BLAST can also adjust for word size for alignment, and changing this parameter in flexidot helps to control noise and speed up for longer sequence inputs (I tried a 140-Kb region with --wordsize 20, which runs in a reasonable time). Anyway, look forward to the next update.

The second I raised was about running flexidot via ssh login. I installed and updated Matplotlib in our server but the plotting seems not successful:

python code/flexidot_v1.04.py -i test-data/test-seqs.fas

...reading input arguments... fasta file #1: test-data/test-seqs.fas


INPUT/OUTPUT OPTIONS...

Input fasta file: test-data/test-seqs.fas Automatic fasta collection from current directory: False Collage output: True Number of columns per page: 4 Number of rows per page: 5 File format: png Residue type is nucleotide: True

CALCULATION PARAMETERS...

Wordsize: 7 Plotting mode: 0 self Ambiguity handling: False Reverse complement scanning: True Alphabetic sorting: False Prefix for output files: None

GRAPHIC FORMATTING...

Plot size: 10 Line width: 1 Line color: black Reverse line color: #009243 X label position: True Label size: 10 Spacing: 0.04 Title length (limit number of characters): first20characters Length scaling: False

==================================================

Running plotting modes 0

Selfdotplot Collage: Few sequences - correcting number of rows: ncols=4, nrows=2

==================================================

Creating 6 selfdotplot images

=> Traceback (most recent call last): File "code/flexidot_v1.04.py", line 3323, in main(input_fasta, wordsize, modes=plotting_modes, prefix=output_file_prefix, plot_size=plot_size, label_size=label_size, filetype=filetype, type_nuc=type_nuc, convert_wobbles=wobble_conversion, substitution_count=substitution_count, rc_option=rc_option, gff=input_gff_files, multi=collage_output, ncols=m_col, nrows=n_row, alphabetic_sorting=alphabetic_sorting, lcs_shading=lcs_shading, lcs_shading_num=lcs_shading_num, lcs_shading_ref=lcs_shading_ref, lcs_shading_interval_len=lcs_shading_interval_len, lcs_shading_ori=lcs_shading_ori, gff_color_config_file=gff_color_config_file, input_user_matrix_file=input_user_matrix_file, user_matrix_print=user_matrix_print, length_scaling=length_scaling, title_length=title_length, title_clip_pos=title_clip_pos, spacing=spacing, max_N_percentage=max_N_percentage, mirror_y_axis=mirror_y_axis, verbose=verbose) File "code/flexidot_v1.04.py", line 3214, in main list_of_png_names = selfdotplot(seq_list, wordsize, prefix=prefix, label_size=label_size, title_length=title_length, title_clip_pos=title_clip_pos, plot_size=plot_size, filetype=filetype, type_nuc=type_nuc, convert_wobbles=convert_wobbles, substitution_count=substitution_count, alphabetic_sorting=alphabetic_sorting, multi=multi, ncols=ncols, nrows=nrows, gff_files=gff, gff_color_dict=gff_feat_colors, mirror_y_axis=mirror_y_axis, max_N_percentage=max_N_percentage, verbose=verbose) File "code/flexidot_v1.04.py", line 1959, in selfdotplot P.cla() # clear any prior graph File "/opt/software/miniconda/4.4.10--GCC-4.9.4/envs/flexidot/lib/python2.7/site-packages/matplotlib/pyplot.py", line 3811, in cla ret = gca().cla() File "/opt/software/miniconda/4.4.10--GCC-4.9.4/envs/flexidot/lib/python2.7/site-packages/matplotlib/pyplot.py", line 969, in gca return gcf().gca(kwargs) File "/opt/software/miniconda/4.4.10--GCC-4.9.4/envs/flexidot/lib/python2.7/site-packages/matplotlib/pyplot.py", line 586, in gcf return figure() File "/opt/software/miniconda/4.4.10--GCC-4.9.4/envs/flexidot/lib/python2.7/site-packages/matplotlib/pyplot.py", line 533, in figure kwargs) File "/opt/software/miniconda/4.4.10--GCC-4.9.4/envs/flexidot/lib/python2.7/site-packages/matplotlib/backend_bases.py", line 161, in new_figure_manager return cls.new_figure_manager_given_figure(num, fig) File "/opt/software/miniconda/4.4.10--GCC-4.9.4/envs/flexidot/lib/python2.7/site-packages/matplotlib/backend_bases.py", line 167, in new_figure_manager_given_figure canvas = cls.FigureCanvas(figure) File "/opt/software/miniconda/4.4.10--GCC-4.9.4/envs/flexidot/lib/python2.7/site-packages/matplotlib/backends/backend_qt5agg.py", line 24, in init super(FigureCanvasQTAgg, self).init(figure=figure) File "/opt/software/miniconda/4.4.10--GCC-4.9.4/envs/flexidot/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py", line 234, in init _create_qApp() File "/opt/software/miniconda/4.4.10--GCC-4.9.4/envs/flexidot/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py", line 125, in _create_qApp raise RuntimeError('Invalid DISPLAY variable') RuntimeError: Invalid DISPLAY variable

Thanks, Shujun

oushujun commented 6 years ago

FYI, I also have tkinter installed and updated.

Shujun

molbio-dresden commented 6 years ago

Hello @oushujun,

thanks for the detailed error message. We couldn't reproduce the error on our system. However, we found an entry at stackoverflow, that may help to solve your problem:

stackoverflow - RuntimeError: Invalid DISPLAY variable

It includes two possible solutions:

(1) Declare import matplotlib and matplotlib.use('agg')before import pylab as P.

(2) Use P.switch_backend('agg') after import pylab as P.

We attach two FlexiDot versions to this comment, one for each suggested solution. Both versions worked normally on our system. Another forum post also mentions, that you might need to install PySide. Let us know, if PySide is required in your case and which of the solutions works, if any.

Good luck! Tony & Kathrin

FlexiDot-Issue-Display.zip

oushujun commented 6 years ago

Hi Tony & Kathrin,

Thank you for providing a customized FlexiDot. I got a chance to try both solutions in the packet, and they both work!

There is a minor bug complaining not finding the default font:

/opt/software/miniconda/4.4.10--GCC-4.9.4/envs/flexidot/lib/python2.7/site-packages/matplotlib/font_manager.py:1331: UserWarning: findfont: Font family [u'sans-serif'] not found. Falling back to DejaVu Sans (prop.get_family(), self.defaultFamily[fontext]))

That's the thing I can fix on my side. Thanks again!

Best, Shujun

kashiff007 commented 2 years ago

Hi @oushujun have able you to use this new FlexiDot script for large genome with blastn table output? If yes, could you please provide the detail of where to use the blast as input?