openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
715 stars 38 forks source link

[REVIEW]: MTSS: A CUDA software for the analysis of Multivariate Time Series #1049

Closed whedon closed 5 years ago

whedon commented 5 years ago

Submitting author: @DavideNardone (Davide Nardone) Repository: https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Sofwtare Version: v1.0.0 Editor: @jedbrown Reviewer: @krischer, @karlrupp Archive: Pending

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/0a9f9006cebb80198e0ad5448cc1fc10"><img src="http://joss.theoj.org/papers/0a9f9006cebb80198e0ad5448cc1fc10/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/0a9f9006cebb80198e0ad5448cc1fc10/status.svg)](http://joss.theoj.org/papers/0a9f9006cebb80198e0ad5448cc1fc10)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@krischer & @karlrupp, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @jedbrown know.

Please try and complete your review in the next two weeks

Review checklist for @krischer

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @karlrupp

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 5 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @krischer, it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

jedbrown commented 5 years ago

@krischer @karlrupp :wave: Welcome! The comments from whedon above outline the review process. I'll be watching this thread if you have any questions. Thanks!

@DavideNardone It looks like there are some capitalization issues in your bibliography and perhaps you can cite a more recent CUDA since your package requires 5.0 or later.

DavideNardone commented 5 years ago

@whedon commands

whedon commented 5 years ago

Here are some things you can ask me to do:

# List Whedon's capabilities
@whedon commands

# List of editor GitHub usernames
@whedon list editors

# List of reviewers together with programming language preferences and domain expertise
@whedon list reviewers

# Compile the paper
@whedon generate pdf

🚧 🚧 🚧 Experimental Whedon features 🚧 🚧 🚧

# Compile the paper from a custom git branch
@whedon generate pdf from branch custom-branch-name
DavideNardone commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

DavideNardone commented 5 years ago

@jedbrown I updated the CUDA reference in the bibliography but regarding the capitalization issue in the bibliography I generated it by using Google Scholar Button so i dont know what to fix. Can you please be more specific ?

jedbrown commented 5 years ago

@DavideNardone It is necessary to add a brace around words that should be capitalized, such as {IEEE}. I'll see if we can change the style file because it's a common issue and I'm not sure the lack of capitalization here is intentional.

Note that you have misspelled "Software" as "Sofwtare" in your repository name.

DavideNardone commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

krischer commented 5 years ago

Review

This package performs dynamic time warping (DTW) in many dimensions and is implemented on the GPU. I can only recommend this package for publication after some points have been addressed and this requires a major revision. I encourage the author to work at these points.

For full disclaimer I note that I have never used DTW in my own research but I am familiar with similar concepts and this review here largely evaluates the software side of things which I feel qualified to do.

Major points in bullet point form (they are expanded upon later in this review and in issues in the package’s repository).

Comments on the checklist:

Version: Does the release version given match the GitHub release (v1.0.0)?

https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/2

Installation: Does installation proceed as outlined in the documentation?

https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/3

Functionality: Have the functional claims of the software been confirmed?

I don’t know to be honest. This is not my area of expertise so I don’t have a test data set to use this with. Additionally there are no unit tests with some small analytical examples that could be investigated. The readme contains some usage example and they run but I don’t know what the output should be. I recommend to add some small and easily visually verifiable examples so that this is easier to test. Also comparisons with other software packages would be good to have.

https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/4 https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/5 https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/6

Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

The self-proclaimed main reason of existence of the software is to be faster. The specific claim is that the GPU version is three orders of magnitudes faster. This seems a bit excessive to me as common speed ups (except for some extreme cases) when moving a well optimised code to GPUs seem to be at most an order of magnitude. In general these comparison are really hard to make and would require well-optimised versions of both codes and benchmarking in very controlled environments. The first comment in this thread is IMHO well argued: https://devtalk.nvidia.com/default/topic/945987/what-is-maximum-speed-up-that-can-be-obtained-with-gpu-/

I’d recommend removing the performance claims or backing them by some substantial benchmarks, especially also comparing against existing and optimised CPU implementations.

https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/7

Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.

No package manager but the only requirement is a working cuda compiler.

Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).

https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/13

Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?

https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/8

Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?

https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/9

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/10

References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/11

Comments on the readme:

https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/12

Generic notes on the code:

https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/14 https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/15 https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/16 https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/17 https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/18 https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/19 https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/20 https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/21 https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/22 https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/23

Comments on the paper:

jedbrown commented 5 years ago

@krischer Thanks for your thorough review. To answer your question, JOSS papers include a link to the repository as part of the decoration (left margin on the first page) so it doesn't need to be repeated.

DavideNardone commented 5 years ago

@krischer Thank you for reviewing the software, I really appreciate the effort you put in. Along with my work commitments, I'll try to give reasonable answers to all your questions and to fix the bugs (if it's possible) as soon as possible.

karlrupp commented 5 years ago

FYI: My review will follow soon.

karlrupp commented 5 years ago

Review of MTSS

MTSS provides GPU-enabled implementations for the analysis of multivariate time series. As such, the software primarily aims to provide a fast implementation of an established family of algorithms, rather than providing implementations of fundamentally novel algorithmic approaches. Thus, in the following I focus on the performance aspect.

Comments on Checklist Items

Downloading, Compiling the Code, and Running the Examples

The repository is sufficiently light-weight in order to be cloned quickly. After cloning, the lack of a build system is apparent: https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/16 Particularly now as the sources are split up into include and source folders (see https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/18), extra parameters are required to nvcc.

The compilation instructions in the README as well as in the paper still need to be updated accordingly. In particular, the source file extensions need to be changed back to cu: https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/pull/25

Passing a worksize parameter at compile time exposes an implementation detail to the user: https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/23 I recommend runtime dispatches inside MTSS. Moreover, the documentation should also mention that CUDA arch 2.0 or above is required: https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/26

While the default compilation command succeeds, merely passing an optimization flag to the host compiler triggers a set of warnings: https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/14

Once the compilation stage is completed, running the examples works as documented. On closer inspection, however, there are two isses:

The API documentation is appropriate, particularly when considering that MTSS is meant to be used as a standalone software, not as a library. I did notice, however, that many functions are referred to as callbacks, even though they are not used as callback routines in the established meaning.

I strongly recommend to add an extensive test suite: https://github.com/DavideNardone/MTSS-Multivariate-Time-Series-Software/issues/9 It might have caught e.g. memory access violations in an earlier and allows others to better verify the correctness of the results.

Comments on the paper

The paper is fairly well written and explains the needs MTSS addresses. If possible, the paper would benefit from a figure (e.g. the one referenced in the README.md).

The Examples section would benefit from a bit of extra prose explaining the compilation and providing a bit of context to the examples.

DavideNardone commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

PDF failed to compile for issue #1049 with the following error:

/app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:377:in parse': (tmp/1049/paper/paper.md): could not find expected ':' while scanning a simple key at line 3 column 1 (Psych::SyntaxError) from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:377:inparse_stream' from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:325:in parse' from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:252:inload' from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:473:in block in load_file' from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:472:inopen' from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:472:in load_file' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-443822f01563/lib/whedon.rb:68:ininitialize' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-443822f01563/lib/whedon/processor.rb:32:in new' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-443822f01563/lib/whedon/processor.rb:32:inset_paper' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-443822f01563/bin/whedon:52:in prepare' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.0/lib/thor/command.rb:27:inrun' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.0/lib/thor/invocation.rb:126:in invoke_command' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.0/lib/thor.rb:387:indispatch' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.0/lib/thor/base.rb:466:in start' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-443822f01563/bin/whedon:113:in<top (required)>' from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in load' from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in

'

DavideNardone commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

PDF failed to compile for issue #1049 with the following error:

/app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:377:in parse': (tmp/1049/paper/paper.md): could not find expected ':' while scanning a simple key at line 3 column 1 (Psych::SyntaxError) from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:377:inparse_stream' from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:325:in parse' from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:252:inload' from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:473:in block in load_file' from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:472:inopen' from /app/vendor/ruby-2.4.4/lib/ruby/2.4.0/psych.rb:472:in load_file' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-443822f01563/lib/whedon.rb:68:ininitialize' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-443822f01563/lib/whedon/processor.rb:32:in new' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-443822f01563/lib/whedon/processor.rb:32:inset_paper' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-443822f01563/bin/whedon:52:in prepare' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.0/lib/thor/command.rb:27:inrun' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.0/lib/thor/invocation.rb:126:in invoke_command' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.0/lib/thor.rb:387:indispatch' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.0/lib/thor/base.rb:466:in start' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-443822f01563/bin/whedon:113:in<top (required)>' from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in load' from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in

'

arfon commented 5 years ago

@DavideNardone - the paper won't compile as it's entirely missing the YAML header at the top of the paper.md. See this example paper for what's missing.

DavideNardone commented 5 years ago

@DavideNardone - the paper won't compile as it's entirely missing the YAML header at the top of the paper.md. See this example paper for what's missing.

Sorry my bad!

DavideNardone commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

DavideNardone commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

DavideNardone commented 5 years ago

Hello everybody, I finally made most of the changes you've requested. There may be still something issues to solve it but please let's clarify them together so that i can provide a better solution.

Thank you.

Cheers,

DN

jedbrown commented 5 years ago

What error? The proof above seems well-formed. Many bib items need DOI and conference names need case protection {ACM SIGKDD} ... Also, the source repository is in the margin so no need to link it manually in the text.

You should also compare to related work; what does MTSS provide to the research community that isn't available through existing packages? (It doesn't need to be unique capability, it could be better performance, a better license, improved extensibility with evidence of that, etc.)

DavideNardone commented 5 years ago

There was no error, i just copied and pasted the old message wrong. Regarding the comparing with other tools, i haven't found any Multidimensional Time Series source code available yet to compare against. Although i understand your concern about it, there's evidence that the GPU version perform faster than the CPU ones, therefore i won't be surprised whether other CPUs methods will be slower. Anyhow, what kind of comparing do you want me to perform? Can you please be more specific?

Furthermore, I don't understand what do you mean by: "what does MTSS provide to the research community that isn't available through existing packages?". It's been initially stated that the software was a command line tool and not a library. If this was a prior requirement, why was this not initially specified as a fundamental requirement for the software to be published?

jedbrown commented 5 years ago

GPU implementations being faster than CPU implementations is by no means automatic. If you assert that it is faster, then you'd have to provide evidence of that. If there is no other software that does what MTSS does, then you wouldn't make a claim of better performance except by a performance model (e.g., if you achieve a fraction of memory bandwidth or floating point peak). You can still mention significant software that is nearby and identify how MTSS is unique.

There is no requirement that JOSS submissions be libraries.

DavideNardone commented 5 years ago

Can you please explicitly tell me where I claimed my software achieve better performance than others? Do you refer to this sentence: "This problem has been addressed in [@sanguansat2012multiple] which allows the alignment of multidimensional time signals by simply extending the concept behind the DTW but showing low-speed performance". If so, I will try to find Multivariate Time Series software which I can compare with.

jedbrown commented 5 years ago

I was responding to your comment above speculating that a GPU implementation would outperform a hypothetical CPU implementation. As for the paper, it should put its contribution in context. If there is no other software that does what MTSS does, you can state that and reviewers can evaluate whether that is true. If there is other software, you should explain the context in a way that would help a researcher decide what software is most appropriate for their needs.

DavideNardone commented 5 years ago

Well, the fact a GPU implementation could be not faster than a CPU one can be true depending on the kind of task. I honestly think this kind of task is a good fit for the GPU when the dataset is made of many sample and variables. In the latter case, MTSS can drastically improve the performance of the comparisons since our I-MDTW version can split each DTW variable computation over different threads. As far as i know MTSS is the only GPU software able to compute a Multivariate Dynamic Time Warping.

However, I've been looking for some other software to compare against to but since now I've found a M-DTW CPU based version (also cited in the work) that I could use to make some comparisons. What do you think about it?

jedbrown commented 5 years ago

Yes, if performance is an explicit claim, then comparison to an existing state of the art is warranted. Reviewers may chime now or later in if they are aware of specific packages that would be good to compare with.

DavideNardone commented 5 years ago

Well I've almost finished the tests, so that i can show you up some speed-up results as indicated by @jedbrown. However, how long the second review should take? I've submitted this software the last October. Thank you

jedbrown commented 5 years ago

@DavideNardone Your submission required major revisions that were either recently completed or are not quite complete yet. We need to allow reviewers a reasonable amount of time to review since the latest revision, regardless of the initial submission date. I hope our reviewers get a chance to review the latest updates in the next couple weeks.

Did you add performance results? (You might need to @whedon generate pdf if you revised the paper after the discussion above.)

labarba commented 5 years ago

👋 @krischer, @karlrupp — It looks like we're waiting for you to take a second look at this submission now, after the author's revisions. What's your status? Thanks!

karlrupp commented 5 years ago

Thanks, @labarba , for asking. I'm not sure from the discussion whether the second round of review has started yet (the last post by the author says "I've almost finished the tests,", suggesting that there's still more to come; no reply to the last question by @jedbrown )

labarba commented 5 years ago

Thanks, @karlrupp.

@DavideNardone — What's your status with revisions? The handling editor asked about performance results. Let us know!

DavideNardone commented 5 years ago

Hello everybody,

I've got some of the results but i haven't had time to update the paper yet. I will asap.

Here a preview for the first task.

screen shot 2019-02-18 at 8 41 23 pm screen shot 2019-02-18 at 8 39 12 pm
DavideNardone commented 5 years ago

Hello everybody, since i couldn't find more than one or two existing state of the art Multivariate Dynamic Time Warping software to which compare my software I haven't completed the tests yet. I've been thinking whether to compare my software against CPU-Univariate Dynamic Time Warping. Whether just one/two methods will not be fine for you I think that I will need more time to complete the tests. Said that, let me know whether just using two existing state of the art methods is sufficient.

Kind Regards, DN

jedbrown commented 5 years ago

It isn't so much about the number of packages that you compare to, but about whether the comparisons are representative and whether the packages you compare to are the most relevant to a researcher trying to choose appropriate software. If your software is also competitive for univariate DTW, then that might also be a draw for attracting users.

jedbrown commented 5 years ago

@whedon generate pdf