magicDGS / ReadTools

A Universal Toolkit for Handling Sequence Data from Different Sequencing Platforms
https://magicdgs.github.io/ReadTools/
MIT License
6 stars 3 forks source link

Add DownloadDistmapResult check for _SUCCESS on startup #471

Closed magicDGS closed 6 years ago

magicDGS commented 6 years ago

This adds a check to the _SUCCESS file on startup to avoid downloading truncated data due to non-termination of the MapReduce job. This check can be disable with an advance argument to allow the user to download data for debugging, but it is not recommended.

In addition, this commit reformats the code a bit without logic impact and with improved error messages for the download of only some parts (debug argument).

Closes #470

codecov[bot] commented 6 years ago

Codecov Report

Merging #471 into master will increase coverage by 0.013%. The diff coverage is 100%.

Impacted file tree graph

@@               Coverage Diff               @@
##              master      #471       +/-   ##
===============================================
+ Coverage     93.374%   93.386%   +0.013%     
- Complexity       990       993        +3     
===============================================
  Files             95        95               
  Lines           2641      2646        +5     
  Branches         284       284               
===============================================
+ Hits            2466      2471        +5     
  Misses           134       134               
  Partials          41        41
Impacted Files Coverage Δ Complexity Δ
...readtools/tools/distmap/DownloadDistmapResult.java 97.143% <100%> (+0.476%) 14 <4> (+3) :arrow_up:
robmaz commented 6 years ago

I think it is fine.

Cheers Rupert

2018-06-06 17:35 GMT+02:00 Daniel Gómez-Sánchez notifications@github.com:

@magicDGS commented on this pull request.

Waiting for @robmaz https://github.com/robmaz comments.

In src/main/java/org/magicdgs/readtools/tools/distmap/ DownloadDistmapResult.java https://github.com/magicDGS/ReadTools/pull/471#discussion_r193455773:

 public Set<String> partNames = new HashSet<>();

+

  • private static final String DISABLE_SUCCESS_CHECK_ARG_NAME = "disable-success-check";
  • @Advanced
  • @Argument(fullName = DISABLE_SUCCESS_CHECK_ARG_NAME, doc = "Disable the check for the _SUCCESS file on the input folder.", optional = true)

@robmaz https://github.com/robmaz, can you comment on this argument description? It will appear as an advance argument in the help, but maybe we can be more explicit in that this is not recommended unless you know what they are doing. Any suggestion for the phrasing?

In src/main/java/org/magicdgs/readtools/tools/distmap/ DownloadDistmapResult.java https://github.com/magicDGS/ReadTools/pull/471#discussion_r193456408:

@@ -113,16 +118,35 @@ protected void onStartup() {

 }
  • / Find all the part files /
  • private void findPartFiles(final Path inputPath) {
  • // first check the _SUCCESS file
  • if (!disableSuccessCheck && !Files.exists(inputPath.resolve(SUCCESS_FILE_NAME))) {

@robmaz https://github.com/robmaz - I opt for only check the existence if requrested, but do not log a warning if the argument is specify. The idea is that if the program fails in the first place, and the user specifies the argument, it should be aware of the error message. If someone use always the argument, it is their own responsibility.

Are you ok with this?

In src/main/java/org/magicdgs/readtools/tools/distmap/ DownloadDistmapResult.java https://github.com/magicDGS/ReadTools/pull/471#discussion_r193456604:

@@ -113,16 +118,35 @@ protected void onStartup() {

 }
  • / Find all the part files /
  • private void findPartFiles(final Path inputPath) {
  • // first check the _SUCCESS file
  • if (!disableSuccessCheck && !Files.exists(inputPath.resolve(SUCCESS_FILE_NAME))) {
  • throw new UserException(String.format(
  • "Unable to find %s file. Input folder might not contain a successful run (--%s to disable this check, which might produce a truncated output).",
  • SUCCESS_FILE_NAME, DISABLE_SUCCESS_CHECK_ARG_NAME));

Are you ok with this message or you wanna something more specific?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/magicDGS/ReadTools/pull/471#pullrequestreview-126431710, or mute the thread https://github.com/notifications/unsubscribe-auth/Ad_FfO04NDTukNyUmYeBqWlvhL_O1Hgkks5t5_bDgaJpZM4Uc4Ix .

magicDGS commented 6 years ago

Thanks for looking at it. @robmaz!