mjsull / chromatiblock

Colinear block visualisation tool
GNU General Public License v3.0
30 stars 4 forks source link

JOSS Review #12

Closed rpetit3 closed 3 years ago

rpetit3 commented 3 years ago

Hi @mjsull,

Chromatiblock is a great tool that I think many people fill find useful. For the most part I had no issues going from multiple FASTAs to Chromatiblock output.

Unfortunately in the current version (from Conda), there is an issue with the SVG output (also affects PDF and PNG outputs) that will need to be fixed to complete this review. I don't foresee this being a huge issue to fix since at first glance it appears to be syntax related.

Below are my comments related to https://github.com/openjournals/joss-reviews/issues/2451. While reading my comments, please keep in mind, many of them are minor and only suggestions for usability improvements.

Feel free to reach out for clarifications.

Thank you for such a great tool! Robert

Major Issue

Currently the SVG creation has an issue which also breaks PDF and PNG creation.

Here's output from trying to open the produced SVG in Firefox. It opened in Inkscape but was a truncated version the HTML output.

XML Parsing Error: not well-formed
Location: file:///F:/review/test.svg
Line Number 3, Column 10:
 <a href=test_files/0.html>  <rect stroke="#5b0a0a" stroke-opacity="1.000000" stroke-alignment="inner"
---------^

PDF and PNG also produce similar error: xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3, column 9

SVG and HTML outputs for your review

Paper Comments

  1. Italics for C. difficile

  2. The genus (Clostridium) is given in Figure 1's description, but I think it should also be given at first mention in the main text.

An example of a global alignment of 28 complete C. difficile genomes is shown in Fig. 1A.

An example of a global alignment of 28 complete Clostridium difficile genomes is shown in Fig. 1A.

  1. A list of Assembly accessions for the 28 genomes should be included, otherwise it should be mentioned they were randomly selected.

Chromatiblock README Comments

Major Comments

  1. I think the README has everything needed to get up and going with Chromaticblock, but the layout could be improved. Currently the formatting makes it easy to overlook things (examples: optional tools) and some things just kind of blend in together (example: the demos). Again the text is there, just needs to be more readable.

Minor Comments

  1. For the installation section, I think changing the formatting a little would help separate the two ways of setting up Chromaticblock. Example:

Installation

Via Conda

From Source

Requirements

Optional

  1. Typo in Installation Via Conda. (environemtn)

  2. The link in Direct Download is broken. It looks like it also points to a previous release. An alternative could be to add a Github Release (latest by date) badge from shields.io could with this for future releases.

  3. requirements section a. Capitalize header (Requirements) b. A subheader named Optional would help separate whats needed. At the moment, only Python >= 3.6.0 is all I see because the bold. c. Typo in Sibelia description (autmoatic) d. Some file formats are all caps (PDF, PNG) some are not (svg, html) e. Include links to the optional tools

  4. In Figure Description a. Italics on Cdiff b. "ge-nomes" typo

Chromatiblock Usage

My Environment

# OS
cat /proc/version
Linux version 4.19.0-9-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07)

# Conda
conda --version
conda 4.8.3

# Setup
conda create -y -n chromatiblock-review -c conda-forge -c bioconda chromatiblock

Major Comments

  1. The full usage should be printed out with exit code 0 (currently 1) if no arguments are given with chromatiblock. Otherwise, at minimum the required parameters should be described (-f or -d, -w, and -o).

    chromatiblock
    No input files found use -f or -d
  2. The usage should be improved with the use of positional arguments or subparsers. Currently there is no separation between the required arguments and truly optional arguments. I think based on the either or for -f and -d, subparsers would be easiest to implement.

  3. Working directory is a required parameter (https://github.com/mjsull/chromatiblock/blob/master/chromatiblock#L1489), but its not in the usage epilog (https://github.com/mjsull/chromatiblock/blob/master/chromatiblock#L1428-L1432)

  4. If no extension is given (e.g. -o test) for the output file, it checks to see if test.svg exists (after its already rerun everything) and if so doesn't overwrite it (https://github.com/mjsull/chromatiblock/blob/master/chromatiblock#L1356-L1360). But this check is made pointless if chromatiblock is run again with -o test.svg since the existing test.svg will be overwritten (https://github.com/mjsull/chromatiblock/blob/master/chromatiblock#L1344-L1345) because the check doesn't happen.

  5. SVG file that was generated did not work. Attached my outputs for your inspection. When opening the SVG with Chrome or Firefox I get:

    XML Parsing Error: not well-formed
    Location: file:///F:/review/test.svg
    Line Number 3, Column 10:
    <a href=test_files/0.html>  <rect stroke="#5b0a0a" stroke-opacity="1.000000" stroke-alignment="inner"
    ---------^
  6. PDF could not be created with conda install, which I think is related to the previous comment related to the SVG issue.

    
    Traceback (most recent call last):
    File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/xml/etree/ElementTree.py", line 1693, in feed
    self.parser.Parse(data, 0)
    xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 9

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/rpetit/miniconda3/envs/chromatiblock-review/bin/chromatiblock", line 1555, in draw_blocks(core_array, noncore_pos, core_size, args.out, args.genome_height, args.gap, legend_size, args.working_directory, args.ppi, args.categorise != None, args.svg_pan_zoom_location, args.hue_start, args.hue_end) File "/home/rpetit/miniconda3/envs/chromatiblock-review/bin/chromatiblock", line 1351, in draw_blocks cairosvg.svg2pdf(url=os.path.join(working_dir, "temp.svg"), write_to=out_file, dpi=ppi) File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/site-packages/cairosvg/init.py", line 76, in svg2pdf return surface.PDFSurface.convert( File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/site-packages/cairosvg/surface.py", line 142, in convert tree = Tree( File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/site-packages/cairosvg/parser.py", line 405, in init tree = ElementTree.fromstring( File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/site-packages/defusedxml/common.py", line 131, in fromstring parser.feed(text) File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/xml/etree/ElementTree.py", line 1695, in feed self._raiseerror(v) File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/xml/etree/ElementTree.py", line 1602, in _raiseerror raise err xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3, column 9


7. Same issue with PNGs

Traceback (most recent call last): File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/xml/etree/ElementTree.py", line 1693, in feed self.parser.Parse(data, 0) xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 9

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/rpetit/miniconda3/envs/chromatiblock-review/bin/chromatiblock", line 1555, in draw_blocks(core_array, noncore_pos, core_size, args.out, args.genome_height, args.gap, legend_size, args.working_directory, args.ppi, args.categorise != None, args.svg_pan_zoom_location, args.hue_start, args.hue_end) File "/home/rpetit/miniconda3/envs/chromatiblock-review/bin/chromatiblock", line 1353, in draw_blocks cairosvg.svg2png(url=os.path.join(working_dir, "temp.svg"), write_to=out_file, dpi=ppi) File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/site-packages/cairosvg/init.py", line 66, in svg2png return surface.PNGSurface.convert( File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/site-packages/cairosvg/surface.py", line 142, in convert tree = Tree( File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/site-packages/cairosvg/parser.py", line 405, in init tree = ElementTree.fromstring( File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/site-packages/defusedxml/common.py", line 131, in fromstring parser.feed(text) File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/xml/etree/ElementTree.py", line 1695, in feed self._raiseerror(v) File "/home/rpetit/miniconda3/envs/chromatiblock-review/lib/python3.8/xml/etree/ElementTree.py", line 1602, in _raiseerror raise err xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3, column 9


### Minor Comments
1. Version needs a newline added to it (https://github.com/mjsull/chromatiblock/blob/master/chromatiblock#L1469)

(chromatiblock-review) rpetit@ogston:~/review$ chromatiblock --version Version 0.4.2(chromatiblock-review) rpetit@ogston:~/review$



2. `--input_directory` checks for common extensions (https://github.com/mjsull/chromatiblock/blob/master/chromatiblock#L1481). Heads up the Genbank files downloaded from NCBI Assembly, they will have 'gbff' extension. Might be useful to allow the user to specify the extension(s).

3. It would be useful to add the parameter in the missing `-w` and `-o` messages. E.g. "Please specify a working directory (-w)." 

4. `-o`/`--out` I think would be better split into something like  `--output_format` (default: svg (could use [choices](https://docs.python.org/3/library/argparse.html#choices)) and `--prefix` (default: chromatiblock). Because at the moment, a typo in in the extension will cause a SVG file to be created. Example: `-o test.htm` would give `test.htm.svg`. This could at least prevent the user from completing a chromatiblock run only to find out they goofed on the extension.

5. Currently existing outputs and working directories are overwritten. It might be useful to notify the user and exit unless they force existing outputs to be overwritten (e.g. `--force`).

6. Is it necessary to keep the working directory? Alternative is to delete, and have the user request it kept.

7. With the the PDF and PNG, being a SVG conversion, I think would be useful to output HTML or Images (SVG,PNG,PDF). Currently if I ask for SVG, and change my mind to PDF, chromatiblock reruns everything to just convert the SVG to PDF. So if I want all three images I have to run Chromatiblock three times.
mjsull commented 3 years ago

Hi Robert,

Thank-you for taking the time to review our paper, your comments were very helpful. Please find them addressed below.

Major Issue

Currently the SVG creation has an issue which also breaks PDF and PNG creation.

PDF and PNG also produce similar error: xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3, column 9 SVG and HTML outputs for your review

The most recent version of chromatiblock broke svg creation (which unfortunately didn't break it in the viewer I was using). These issues have been fixed in the latest Chromatiblock version.

Paper Comments

Italics for C. difficile

This has been fixed in the revised manuscript.

The genus (Clostridium) is given in Figure 1's description, but I think it should also be given at first mention in the main text.

This has been fixed in the revised manuscript.

A list of Assembly accessions for the 28 genomes should be included, otherwise it should be mentioned they were randomly selected.

These genomes were randomly selected from an unpublished dataset and this is now mentioned in the manuscript. Relevant information about the genomes that were used is now included in README

Chromatiblock README Comments

Major Comments

I think the README has everything needed to get up and going with Chromaticblock, but the layout could be improved. Currently the formatting makes it easy to overlook things (examples: optional tools) and some things just kind of blend in together (example: the demos). Again the text is there, just needs to be more readable.

We have reformatted the README file to be more readable

Minor Comments

For the installation section, I think changing the formatting a little would help separate the two ways of setting up Chromaticblock. Example: Installation Via Conda From Source Requirements Optional

This section has now been reorganised

Typo in Installation Via Conda. (environemtn)

The typo has now been fixed.

The link in Direct Download is broken. It looks like it also points to a previous release. An alternative could be to add a Github Release (latest by date) badge from shields.io could with this for future releases.

The download link has been fixed and made more future-proof.

requirements section a. Capitalize header (Requirements) b. A subheader named Optional would help separate whats needed. At the moment, only Python >= 3.6.0 is all I see because the bold. c. Typo in Sibelia description (autmoatic) d. Some file formats are all caps (PDF, PNG) some are not (svg, html) e. Include links to the optional tools

The requirements section has been updated to address these comments.

In Figure Description a. Italics on Cdiff b. "ge-nomes" typo

This has been fixed in the revised manuscript

Chromatiblock Usage

Major Comments

The full usage should be printed out with exit code 0 (currently 1) if no arguments are given with chromatiblock. Otherwise, at minimum the required parameters should be described (-f or -d, -w, and -o).

Chromatiblock now prints a more comprehensive error message when given no arguments

The usage should be improved with the use of positional arguments or subparsers. Currently there is no separation between the required arguments and truly optional arguments. I think based on the either or for -f and -d, subparsers would be easiest to implement.

We do not believe that a subparser is the right tool here. Argparse now includes mutually exclusive argument groups which may be better for the -f and -d option.

Working directory is a required parameter (https://github.com/mjsull/chromatiblock/blob/master/chromatiblock#L1489), but its not in the usage epilog (https://github.com/mjsull/chromatiblock/blob/master/chromatiblock#L1428-L1432)

This has now been fixed.

If no extension is given (e.g. -o test) for the output file, it checks to see if test.svg exists (after its already rerun everything) and if so doesn't overwrite it (https://github.com/mjsull/chromatiblock/blob/master/chromatiblock#L1356-L1360). But this check is made pointless if chromatiblock is run again with -o test.svg since the existing test.svg will be overwritten (https://github.com/mjsull/chromatiblock/blob/master/chromatiblock#L1344-L1345) because the check doesn't happen.

The logic here was that we did not want to overwrite a file unless the user explicitly specified it as the output file. Chromatiblock will now not add a suffix unless "all" output is used so this is no longer an issue.

SVG file that was generated did not work. Attached my outputs for your inspection. When opening the SVG with Chrome or Firefox I get:

This has now been fixed in the latest version of chromatiblock.

PDF could not be created with conda install, which I think is related to the previous comment related to the SVG issue.

Same issue with PNGs

Both these issues were related to the svg output, which is converted into png and pdf formats. This has been fixed in the latest version of chromatiblock.

Minor Comments

Version needs a newline added to it (https://github.com/mjsull/chromatiblock/blob/master/chromatiblock#L1469) (chromatiblock-review) rpetit@ogston:~/review$ chromatiblock --version Version 0.4.2(chromatiblock-review) rpetit@ogston:~/review$

This has been fixed in the latest version.

--input_directory checks for common extensions (https://github.com/mjsull/chromatiblock/blob/master/chromatiblock#L1481). Heads up the Genbank files downloaded from NCBI Assembly, they will have 'gbff' extension. Might be useful to allow the user to specify the extension(s).

The option to specify an extension has been added.

It would be useful to add the parameter in the missing -w and -o messages. E.g. "Please specify a working directory (-w)."

This information has been added to the error message

-o/--out I think would be better split into something like --output_format (default: svg (could use choices) and --prefix (default: chromatiblock). Because at the moment, a typo in in the extension will cause a SVG file to be created. Example: -o test.htm would give test.htm.svg. This could at least prevent the user from completing a chromatiblock run only to find out they goofed on the extension.

Output format will now need to be specified (html by default)

Currently existing outputs and working directories are overwritten. It might be useful to notify the user and exit unless they force existing outputs to be overwritten (e.g. --force).

We have implemented this behavior in the latest version of chromatiblock.

Is it necessary to keep the working directory? Alternative is to delete, and have the user request it kept.

We have implemented this behavior in the latest version of chromatiblock and added a --keep flag.

With the the PDF and PNG, being a SVG conversion, I think would be useful to output HTML or Images (SVG,PNG,PDF). Currently if I ask for SVG, and change my mind to PDF, chromatiblock reruns everything to just convert the SVG to PDF. So if I want all three images I have to run Chromatiblock three times.

Included in the output options

rpetit3 commented 3 years ago

Hi @mjsull

This is great! I tested out 0.5.1 and everything finished as expected. The PDF, PNG and SVG all look good. (Really like the -of all option!).

Thank you very much for your hard work! Robert