openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
700 stars 36 forks source link

[REVIEW]: Lexicon-Mono-Seq, DOM Text Based Async MSA Viewer #1407

Closed whedon closed 5 years ago

whedon commented 5 years ago

Submitting author: @IbrahimTanyalcin (Ibrahim Tanyalcin) Repository: https://github.com/IbrahimTanyalcin/lexicon-mono-seq Version: v0.19.0 Editor: @csoneson Reviewer: @hrhotz, @jkanche, @imallona Archive: 10.5281/zenodo.3273310

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/9fbf126b501ad921825f755732450278"><img src="http://joss.theoj.org/papers/9fbf126b501ad921825f755732450278/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/9fbf126b501ad921825f755732450278/status.svg)](http://joss.theoj.org/papers/9fbf126b501ad921825f755732450278)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@hrhotz & @jkanche & @imallona, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @csoneson know.

Please try and complete your review in the next two weeks

Review checklist for @hrhotz

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @jkanche

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @imallona

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 5 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @hrhotz, it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

PDF failed to compile for issue #1407 with the following error:

/app/vendor/ruby-2.4.4/lib/ruby/2.4.0/find.rb:43:in block in find': No such file or directory - tmp/1407 (Errno::ENOENT) from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/find.rb:43:incollect!' from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/find.rb:43:in find' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-a1723d160bb6/lib/whedon/processor.rb:57:infind_paper_paths' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-a1723d160bb6/bin/whedon:50:in prepare' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/command.rb:27:inrun' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in invoke_command' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor.rb:387:indispatch' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/base.rb:466:in start' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-a1723d160bb6/bin/whedon:116:in<top (required)>' from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in load' from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in

'

csoneson commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

csoneson commented 5 years ago

@labarba (or @openjournals/joss-eics) - just to say that the manual change of the path to the submitted repository in the PRE-REVIEW issue (#1379) did not seem to carry over here. I manually changed it again above, which I think should be fine for the review, but I'm not sure whether anything else needs to be changed somewhere for a more permanent effect.

csoneson commented 5 years ago

@hrhotz, @jkanche, @imallona - please perform your review in this issue, by updating your checklist above and posting comments. If you have any questions, don't hesitate to ping me.

hrhotz commented 5 years ago

@IbrahimTanyalcin - I have made a pull request with some minor suggestions for the paper text. Also, please consider adding the names of a few examples for browsers using DOM Text + SVG. Same for MSA viewers relying on the canvas element. This might help putting the text in a wider context.

IbrahimTanyalcin commented 5 years ago

@hrhotz, ty for the suggestions:

jkanche commented 5 years ago

@IbrahimTanyalcin - I created a pull request with a few minor edits and fixes to the paper. Couple of things

IbrahimTanyalcin commented 5 years ago

@jkanche, ty for the suggestions

hrhotz commented 5 years ago

@IbrahimTanyalcin - I have been looking at the "clustalW.html" example and I have a few questions/comments:

    • what is the "aa" color scheme based on?
    • would it make sense to offer more than just one color set for amino acids?
    • how do you deal with sequences containing Selenocysteine? I have modified your example by replacing an "A" with and "U" and the complete sequence is no longer displayed,
    • why is the line describing the identity/similarity (the line containing "*",":", etc) not displayed
    • I don't understand your comment: "You can also read clustal files if you provide them to LexiconMonoSeq as a string". You already provide the data as a string in the example, don't-you?
    • displaying an alignment where you provide a file path would be useful.
imallona commented 5 years ago

Hi @IbrahimTanyalcin,

Thanks for the resource. I have some comments/questions regarding the rendered PDF version of the paper, and the resource and its documentation.

I noticed the paper update has fixed some typos already (dna -> DNA), but I still find some in citations. Namely,

Regarding the tool usability, it is stated that (...) the amount of sequences is limited to your browsers capacity, recommended limit is 8000 sequences with 100000 characters long each (...). Is it possible to tabulate/stratify such recommendation, i.e. providing a limit for commodity laptops as well as to recent/powerful machines/workstations?

Regarding the tool usability and/or documentation, I acknowledge the high flexibility given to the durations of animation effects, which (if I'm not wrong) can be quite short, i.e. 17ms. As intermittent visual stimuli are known to elicit epileptic seizures to some, I wonder which are your thoughts on including a warning at the resource documentation suggesting good practices. The W3C indicates that, in order to avoid triggering seizures to photosensitive epilepsy users, blinks should be slower than three a second https://www.w3.org/TR/2008/REC-WCAG20-20081211/#seizure . Not sure of the examples' animation speeds (the ones rendered at readme.md), but if they were faster than that I would suggest slowing them down to the W3C recommendations.

IbrahimTanyalcin commented 5 years ago

@hrhotz Thank you for the well raised comments, let me go over some of these:

instance.registerType("myType",{"A":"#ffbbaa","B":"00aacc"...})

You can also specify a text color and opacity which are optional, default text color is rgb(0,0,0,0.9) and default opacity for any character is 0.8

Normally you do not have to create a new object from scratch to pass to instance.registerType as an argument, a copy function exists instance.caveManCopy on the prototype chain. Example-7(chrX10Mb.html in the repo examples folder) makes use of this. It registers a new type with new characters.

The error you encountered (things not rendering) is possibly because the new char you specified was not registered in the type and it threw Uncaught Reference Error, I am not using try/catch blocks there because I presume it would slow down execution.

IbrahimTanyalcin commented 5 years ago

@imallona Thank you for pointing out these blind spots. I will go over the citations, did not realize that they were incomplete.

For the performance, I just tested the first example and the letters of PI on an old iphone 5S to see if it did render. The animations were not butter smooth, however it handled without problems (iphone 8 and X rendered without problems and with 60fps). Since we are working with DOM, we are mainly using CPU (apart from transform css style, which creates a layer for GPU) and this could be quite variable. That being said, since sequence length is not the bottle neck and to be on the safe side, we could say for most commodity laptops 1000 sequences with 100000 letters each could run reasonably smooth (since they run on iphone 5S). I have access to variety of different devices including android, iOS and Mac. If you have specific hardware let me know so I can test it on them.

About the epileptic seizures, thank you for raising this point! I must say I am completely ignorant in this field and I think it is a good idea to at least include some sort of disclaimer. For the duration, you are correct that the lowest you can go is 17ms (this is hardcoded in window.requestAnimationFrame API). The default duration is 1500milliseconds, this controls the sliding of the text. The colored boxes are filled with a default duration of 150ms with a limit of 80 characters per line. Another parameter is also the painter count (these are parallel async renderers), by default there is only 1 painter, an example where you can spawn more painters is here.

What above means is that, by default if a display has 15 sequences with a screen with of 120 characters, it would take Math.ceil(120/80) 15 150 + (15-1) * 17 ~= 4750ms to fill the screen and the screen would briefly "update" 30 times within this period, that is roughly 6 "updates" per second which is twice the recommended range in the link you kindly provided.

I suggest we add a disclaimer, and for some examples like this, we could increase the duration to 350ms. What do you think?

hrhotz commented 5 years ago

@IbrahimTanyalcin - Thanks for your feedback, please allow me to go a little bit deeper into some of my comments (which I now have numbered) wrt to the "clustalW.html" example:

    • your chosen "aa" color scheme does make sense. Though, you should probably acknowledge the person who set it up originally.
    • the option to add a custom color scheme is great. It might help if you document this (with a code example) in addition to the code example shown for the "asciiArt" and/or add the text you have written for "Example-7(chrX10Mb.html)" in this thread to the documentation. However, my actual comment was: whether it would make sense to have several pre-defined color schemes which you can chose from. As an example have a look at Jalview (http://www.jalview.org/)
    • an optional argument sound very good
    • sure, whatever is the easiest way to implement it, is fine
imallona commented 5 years ago

Hi @IbrahimTanyalcin,

Thank you for your feedback. I see, so setting thresholds for performance is a difficult task due to the huge variety of hardware available. My comment arose from the paper's sentence The amount of sequences is limited to your browsers capacity, recommended limit is 8000 sequences with 100000 characters long each, which I found not very useful as it does not specify what hardware that recommendation applies to (as you say, CPU, operating system/browser). I think that, if updated to add the recommendations for commodity laptops/phones you kindly posted here (1000 sequences with 100000 letters each), that sentence would be more informative, what do you think?

As for the update rate I appreciate your reflection and agree with your suggestions of adding a disclaimer and slowing down some examples.

Aside, as a minor comment unrelated to the former, I just noticed the paper.md refers to version 0.17.3 (...)the library is about 42kB (as of version 0.17.3)(...); you might want to update both library version and its size. Also noticed that dna and ascii should be capitalized.

csoneson commented 5 years ago

👋 @hrhotz, @imallona, @jkanche, @IbrahimTanyalcin - just wanted to check in to see how things are going here! I can see that there are some unticked boxes in all the reviews, as well as some comments for @IbrahimTanyalcin. It would be great to have a quick status update - and please don't hesitate to let me know if there is anything I can do to facilitate.

IbrahimTanyalcin commented 5 years ago

@csoneson @hrhotz @imallona , The reviews were great! They just covered a lot of blind spots that I never thought of. I had to pause for a week, because I am actually integrating this library into mutaframe.com:

example

Not in production yet, but almost there. I'm getting back to the comments at the end of this week. Hope that is ok.

csoneson commented 5 years ago

👋 @jkanche - just checking in on your review status. There are several unticked boxes in your checklist - could you provide some comments for @IbrahimTanyalcin explaining your current reservations in some more detail?

jkanche commented 5 years ago

@csoneson, I did send a pull request with suggestions and updates to the paper, and I was waiting to hear back from @IbrahimTanyalcin until he's ready for the review again since he said he's working on new updates and integration in his last comment

csoneson commented 5 years ago

Thanks for the update @jkanche. @IbrahimTanyalcin - do you have an estimate of when the reviewers could have an updated version to look at (you mentioned above that you would get back to the comments at the end of last week)?

IbrahimTanyalcin commented 5 years ago

@csoneson, getting back to it this weekend :) I merged that pull request I think.

csoneson commented 5 years ago

👋 @IbrahimTanyalcin - any updates on the revisions?

IbrahimTanyalcin commented 5 years ago

:wave: @csoneson - I finished almost adding it to MutaFrame. I was thinking of adding Web Component based constructor to it, however it might be far fetched. I'll only push what has been requested by tomorrow:

IbrahimTanyalcin commented 5 years ago

:wave: @csoneson @hrhotz @jkanche @imallona

I hope you all are doing great, I finally could bring some updates:

csoneson commented 5 years ago

Thanks @IbrahimTanyalcin. @jkanche, @hrhotz, @imallona - could you please have a look at the updated submission when you have a chance?

hrhotz commented 5 years ago

@IbrahimTanyalcin - I am still looking at the "clustalW.html" example referring to the same points as in my comments from April 29/30:

  1. solved
  2. it is great, you have added the other color schemes. But it is not clear to me, how I change from "aa" to "jalviewClustal " in this example
  3. solved
  4. after adding ",{conservation:true}" it displays the conservation. However, in the example, there are 'tab' instead of blanc spaces, and therefore the line is not displayed correctly.
  5. solved
  6. excuse my ignorance, but it is not clear to me how I can replace the hardcoded clustalW alignment with a link to an alignment file.
hrhotz commented 5 years ago

@csoneson WRT "General checks": I have added the "OK" for 'Repository' and 'Version'...though, I am still confused about the link for "repository url" and I am not sure about the version: is there any update mechanism in order to change it to 18.0

IbrahimTanyalcin commented 5 years ago

@hrhotz No worries, I will write a small section to explain how links can be used. The ClustalW normally specifies a single whitespace, the second example is more recent (https://github.com/IbrahimTanyalcin/lexicon-mono-seq/blob/master/examples/asyncFetch.html), I'll try if I can update the converter to except tabs. I'll also add a section of how you can change it from "aa" to other color scheme from a String input. @hrhotz, many thanks for pointing these blind spots!

hrhotz commented 5 years ago

@IbrahimTanyalcin WRT tabs/single space: no need to write a converter to except tabs...it could cause other issues. It is better to fix the example (If time permits, I will make a pull request later).

imallona commented 5 years ago

Hi @IbrahimTanyalcin, thanks for adding the disclaimer; let me check the compiled version of the updated paper.md. Minor comment on versioning: noticed the paper refers to v0.18 (which is the latest tag) but this issue refers to an old v0.15.12, which is not the latest release.

@whedon generate pdf

csoneson commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

csoneson commented 5 years ago

@hrhotz, @imallona - the version in this issue is set at submission time. When a submission is eventually accepted, the version at that time is archived and a DOI is assigned. So at this point, it is not a problem that the version given here differs from the one in the paper. @hrhotz - for the repository link, this is ok too. The original link was incorrect at submission; this was later fixed, but the link in the checklist is still the submitted one.

jkanche commented 5 years ago

Hello @csoneson, I'm currently on travel with limited internet access, I will get to this by the end of this week. Sorry for the delay!

imallona commented 5 years ago

Thanks @csoneson for the explanation on versioning, and for compiling the proof.

@IbrahimTanyalcin could you please fix the International, E citation to be rendered as Ecma International? I also noticed that the Github, 2018 reference supporting (...) utility functions that can be hooked to custom buttons as desired points to github.com instead of linking a repository, is this correct?

jkanche commented 5 years ago

I have a few comments on the paper.

Several MSA viewers rely on rendering characters as an image using the canvas element. Although reliable and convenient, this can result in 2 problems: - Higher memory usage, - Blurred text (increasing resolution increases memory usage)

The blurred text is probably because of a HiDPI display. you probably have to calculate the pixel density and then create a canvas with larger resolution, but yes, higher resolution uses higher memory.

The amount of sequences is limited to your browsers capacity, recommended limit is 1000 sequences with 100000 letters each for commodity laptops/mobile devices.

Browsers can show any amount of text. I think the limitation here is with the number of SVG elements you can add to the DOM. This is why canvas is preferred over d3 for large data sets (a single canvas element vs thousands of SVG nodes). One advantage you have here is there's no interactivity with your SVG elements. Also was this number (1000 sequences, each 100000 bp) based on some benchmark testing you did with the library ?

I don't understand the GitHub and NPM citations at the end of the article.

I was thinking, moving the clustal examples to the top might be better.

IbrahimTanyalcin commented 5 years ago

:wave: @csoneson @hrhotz @imallona @jkanche

I think I'll be able to do corrections and the section I promised to @hrhotz over the weekend. Just to quickly answer @jkanche,

Browsers can show any amount text

That I guess for certain limits, that seems correct. However let's put that to test:

https://stuff.mit.edu/afs/sipb/contrib/pi/pi-billion.txt

This loads about a billion letters of pi, as the page grows you'll notice scroll performance will deteriorate. This means that even a single DOM phasing element with a single textNode has a cost. For that reason I have taken this approach:

I think the limitation here is with the number of SVG elements you can add to the DOM

This is also not completely correct, for instance here:

https://distreau.com/lexicon-mono-seq/examples/PI10Million.html

So when above combined, if traditional canvas without GPU is used, this approach (DOMText + svg) will be mostly faster and have lighter memory footprint. I did several tests my self, you can also check the performance from chrome devtools.

Although current approach has to use CPU mainly, in real life comparison with state of the art MSA tools that do use GPU, I have not seen performance differences.

I did the 1000x10000 benchmark on iphone5, it was good enough so I said let's keep it there. For desktop it can run up to 8000, gets slower afterwards because I have animations.

One advantage you have here is there's no interactivity with your SVG elements

You can interact with them, there are functions to get their relative positions here, these functions return the entire sequence ledger with DOM nodes, so you can pretty much do anything with them without impacting performance

jkanche commented 5 years ago

That I guess for certain limits, that seems correct. However let's put that to test:

https://stuff.mit.edu/afs/sipb/contrib/pi/pi-billion.txt

The issue here is not with the text, its with the transfer time, since the browser is rendering the text and streaming chunks of the text file at the same time. This causes the lag, but once everything is in memory, the page loads the entire text pretty quickly. for me, It took 5.6 minutes (from devtools) to transfer the entire file from the server to the client.

The amount of sequences is limited to your browsers capacity, recommended limit is 1000 sequences with 100000 letters each for commodity laptops/mobile devices.

I think the limitation here is with the number of SVG elements you can add to the DOM

My comment was based on the statement in the paper. So if the page never renders the entire sequence(s), I think the paper doesn't have to mention this as a limitation ?

if traditional canvas without GPU is used, this approach (DOMText + svg) will be mostly faster and have lighter memory footprint. I did several tests my self, you can also check the performance from chrome devtools.

were these tests documented anywhere in the GitHub repository ?

I didn't know that you can add interactivity to the elements, thanks for clarifying that up.

IbrahimTanyalcin commented 5 years ago

@jkanche,

IbrahimTanyalcin commented 5 years ago

:wave: @csoneson @hrhotz @imallona @jkanche

Here are some updates:

jkanche commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

csoneson commented 5 years ago

@IbrahimTanyalcin Looking at the remaining points in the reviewers' checklists above, could you add a couple of statements to the README with guidelines for people who may want to contribute/seek help, and pointers to how to best test the functionality of the software (including how to verify that the output is correct)? There is one reference (Robinson et al) where the https://doi.org/ part of the DOI should be removed. @hrhotz, @imallona, @jkanche - please comment if you have additional comments, or let me know (and fill the checklist) if you are happy with the current state. Thanks!

imallona commented 5 years ago

Hi @csoneson and @IbrahimTanyalcin, I'm happy with the current version, thanks.

IbrahimTanyalcin commented 5 years ago

:wave: @csoneson