openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: SneakerNet: A modular quality assurance and quality check workflow for primary genomic and metagenomic read data #2334

Closed whedon closed 3 years ago

whedon commented 4 years ago

Submitting author: @lskatz (Lee Katz) Repository: https://github.com/lskatz/SneakerNet Version: v0.19.7 Editor: @lpantano Reviewers: @lfaller, @erinyoung, @druvus Archive: 10.5281/zenodo.4657026

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/44528f308e36e17f00434e9e5d44d4c7"><img src="https://joss.theoj.org/papers/44528f308e36e17f00434e9e5d44d4c7/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/44528f308e36e17f00434e9e5d44d4c7/status.svg)](https://joss.theoj.org/papers/44528f308e36e17f00434e9e5d44d4c7)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@lfaller & @ @erinyoung, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @lpantano know.

Please try and complete your review in the next six weeks

Review checklist for @lfaller

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @erinyoung

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @druvus

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @lfaller, @ @erinyoung it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 4 years ago
Reference check summary:

OK DOIs

- 10.1101/654319 is OK

MISSING DOIs

- None

INVALID DOIs

- None
whedon commented 4 years ago

PDF failed to compile for issue #2334 with the following error:

pandoc-citeproc: reference erinyoung not found Error producing PDF. ! TeX capacity exceeded, sorry [input stack size=5000]. \reserved@a ->\def \reserved@a *{\let \@xs@assign \@xs@expand@and@detokenize... l.345 }

Looks like we failed to compile the PDF

lpantano commented 4 years ago

@lskatz, can you try to fix the issue for the paper compilation?

lpantano commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

lpantano commented 4 years ago

@lfaller saw the issue was a reviewer typo in the description of this issue. Now there is a PDF for the paper. Thanks!

lskatz commented 4 years ago

Thank you for catching that, @lpantano! I'm not sure I would have been able to find that so fast!

erinyoung commented 4 years ago

@lpantano https://github.com/lpantano et al.,

I went to the invitation URL ( https://github.com/openjournals/joss-reviews/invitations), and didn't see an invitation associated with my github account, or any related notifications.

Could this be due to my handle being listed as "@ @erinyoung" instead of "@erinyoung"? (I noticed that my handle has an extra "@" in this thread)

On Tue, Jun 16, 2020 at 12:39 PM Lee Katz notifications@github.com wrote:

Thank you for catching that, @lpantano https://github.com/lpantano! I'm not sure I would have been able to find that so fast!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/2334#issuecomment-644943884, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3VWMP43ZLGKH4HYW6QZHDRW637XANCNFSM4N4WPORQ .

-- Always,

Erin erin.olde@gmail.com

arfon commented 4 years ago

@whedon re-invite @erinyoung as reviewer

whedon commented 4 years ago

OK, the reviewer has been re-invited.

@erinyoung please accept the invite by clicking this link: https://github.com/openjournals/joss-reviews/invitations

arfon commented 4 years ago

@erinyoung ☝️ can you see if you can accept the invite now?

lpantano commented 4 years ago

hey @erinyoung and @lfaller, any update in the review process?

lfaller commented 4 years ago

Thanks for the nudge, @lpantano -- I was stuck on "installation"; will reach out to the author 🙂

lpantano commented 4 years ago

Hey @lskatz, any updates on this, are you currently addressing the issues raised by the reviewers? thanks1

lskatz commented 4 years ago

Hi @lpantano and also @erinyoung and @lfaller -- Sorry for my delay. I had a sort of surprise temporary reassignment for COVID-19 at CDC and was not able to address your excellent feedback. We (co-authors and I) should be able to address all outstanding comments in a week or two. Thank you for your patience.

erinyoung commented 4 years ago

@lskatz ,

I'm actually still having issues downloading sneakernet.

I second the opinion to move everything to docker (https://github.com/lskatz/SneakerNet/issues/41)

There are too many hidden dependencies for the source install. I tried make test, and first encountered that I was missing Config/Simple.pm, then Bio/FeatureIO/gff.pm, then it couldn't find the path to mlst and a few other tools, then I had the wrong version of perl installed (v5.26.0 needed, v5.16.3 installed).

I gave up on the source installation, and moved to containers, but your containers are very large. I've been having to get creative to install it with both singularity and docker.

For singularity, I don't have a large /scratch directory in my setup for compressed temp downloads, and squashfs complained when I attempted to download the container with singularity. I re-attempted the download with the "--sandbox" option (singularity build --sandbox sneakernet.simg docker://lskatz/sneakernet:latest). I haven't had the time to test this installation, yet.

I'm sure docker has a similar command for large containers, and I am looking on implementing that.

lskatz commented 4 years ago

Hi @erinyoung , we are in process of a larger reply to the comments above from both you and @lfaller . However for right now one solution might be to unset $SINGULARITY_TMPDIR so that the system default takes hold.

@erinyoung and @lfaller For the latest container (again, we are making a much larger reply to address the comments in full), we removed the Kraken database and provided instructions on how to download the Kraken database separately. The container size changed significantly from 3.9G to 1.1G. Hopefully these two changes can help you get over those immediate obstacles while we address the other comments.

@kapsakcj is the main expert on the container front and so I am tagging him for any further conversation along these lines.

lskatz commented 4 years ago

Hi @erinyoung and @lfaller, I thought I would check in and bring the conversation back to here (although, it was very useful to have the conversation on the SneakerNet site for posterity and for future users!). Recently, we updated the container and its instructions to address some of your concerns here in this thread, and in a couple of threads over here: https://github.com/lskatz/SneakerNet/issues/41, https://github.com/lskatz/SneakerNet/issues/42.

In SneakerNet issue 42, there are more details on how the Kraken database was removed from the container and how instructions were put in place for building the new version of container. Additionally, we added benchmarking information.

In SneakerNet issue 41, we described how we added better documentation.

If you would like to continue testing with the container, please confirm that it is the latest version v0.11.2. I believe that Dockerhub is not updating the :latest tag and so I will look into that.

lfaller commented 4 years ago

@lskatz I just pulled the latest docker image and it looks like it's getting the right version now:

$> docker pull lskatz/sneakernet:latest
...
$> docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
lskatz/sneakernet   latest              5f42f14e0419        11 days ago         3.02GB
$> docker image inspect 5f42f14e0419 | grep "version"
                "container.version": "1",
                "software.version": "0.11.2",
                "container.version": "1",
                "software.version": "0.11.2",
lpantano commented 4 years ago

Hey @lskatz, are you still addressing the reviewer's comments? Thanks

lskatz commented 4 years ago

Hi @lpantano I believe we have addressed all comments up to this point. Please let me know if anything else is outstanding though @erinyoung and @lfaller !

erinyoung commented 4 years ago

Even without the kraken database, it is still a rather large container, so I had to get a little creative on how to install it and get it working.

erinyoung commented 4 years ago

Is SneakerNet.ro.pl in the documentation supposed to be SneakerNet.roRun.pl?

lskatz commented 4 years ago

Hi @erinyoung Good catch! You are right. I have made the documentation changes in https://github.com/lskatz/SneakerNet/commit/09def0a67a6493c4302f80edf60dfc426df3f127 and https://github.com/lskatz/SneakerNet/commit/d94b96611fdf0bfcf5eb07c562e8238a13bf8b1b.

Thank you for pushing a little harder on the container @erinyoung :) I am still a novice on creating containers but fortunately we have our expert @kapsakcj. If we can figure out how to optimize it further, we will. However, I believe we have an unavoidable problem of many dependencies.

lfaller commented 4 years ago

@lskatz minor suggestion: it would be nice to document what pipeline of plugins Sneakernet runs for the example. I figured it out after running the example by looking at the logs, but it would be useful to provide that information in the readme. You could even provide a figure showing that pipeline.

lskatz commented 4 years ago

@lfaller Great point! I had a script SneakerNet.checkdeps.pl that has this function, but it was not documented well enough. (very little documentation here where the script is mentioned once and here where it says cryptically "Workflows are defined in plugins.conf"

In response, I have changed documentation here: https://github.com/lskatz/SneakerNet/edit/master/docs/PLUGINS.md

lskatz commented 4 years ago

@lfaller also in response in the same PLUGINS.md page, I created a basic figure on the default workflow and plugins.

erinyoung commented 4 years ago

I think I still don't understand how to run the example. (Using the docker container pulled with singularity and using singularity shell)

> SneakerNet.roRun.pl --createsamplesheet -o M1234-18-001-test /SneakerNet-0.11.2/example/inbox/test-15-001 
SneakerNet.roRun.pl: demultiplex sample sheets were not found. Creating one directly from fastq filenames into /tmp/SneakerNet.roRun.pl.DaYIcw/runData/samples.tsv
Use of uninitialized value $R2 in substitution (s///) at /SneakerNet-0.11.2/scripts/SneakerNet.roRun.pl line 292.
Use of uninitialized value $R1 in hash element at /SneakerNet-0.11.2/scripts/SneakerNet.roRun.pl line 295.
Use of uninitialized value $R2 in hash element at /SneakerNet-0.11.2/scripts/SneakerNet.roRun.pl line 295.
Use of uninitialized value $R1 in concatenation (.) or string at /SneakerNet-0.11.2/scripts/SneakerNet.roRun.pl line 296.
Use of uninitialized value $R2 in concatenation (.) or string at /SneakerNet-0.11.2/scripts/SneakerNet.roRun.pl line 296.
ERROR: R1 and R2 are not both represented as  and  at /SneakerNet-0.11.2/scripts/SneakerNet.roRun.pl line 296.
lskatz commented 4 years ago

Hi @erinyoung , @kapsakcj and I are trying to replicate your error. Could you also post your singularity shell command? Thank you! One concern we might have is that singularity might be enforcing a read only hard drive, but it is only a guess right now.

lpantano commented 4 years ago

Hi @erinyoung, any updates on this?

lskatz commented 4 years ago

We have one more thing that I hope will be an update. I have been trying to make a multistage build for the container to make things easier. It has been taking me a couple of weeks to learn this process and I'm not quite there yet... but I hope it will make the review process easier if it works 🤞. If it works then hopefully it makes @erinyoung and @lfaller 's lives easier :)

lskatz commented 4 years ago

Hi, I apologize for how long this took but we have a multistage Docker build. Hopefully this makes the review much easier from here on out. This container was tested only for the default workflow. Please let us know if we need to do something more thorough. Thanks to @kapsakcj once again for helping the containers efforts so much.

These changes are reflected in version 0.14.2

Tested with the default workflow on:

@lpantano if the review process and journal allows it, I would like to add @kapsakcj as a co-first author.

lpantano commented 4 years ago

@lskatz yes, that is totally fine. @erinyoung and @lfaller, if you could test this it would be great. Thanks!

lfaller commented 4 years ago

Alright! I was able download the new version and run it and it did seem faster to me (though I didn't collect any hard numbers on that). FYI, I set the number of CPUs to 6, and that let me run the example in a reasonable amount of time.

Thanks for making these changes, I think it really improves usability!

My last nit is this:

I still find the documentation to be very confusing to read. If I was interested in finding a tool that could help me assemble/QC/analyze MiSeq data, this would not convince me that SneakerNet would be the way to go because it's not easy to get started quickly.

The main README.md currently looks like this:

readme_screenshot

  1. The example code mentioned did not get me far -- I had to go dive deeply into the container documentation in order to get started and get things to run.

  2. The synopsis text reads:

    A pipeline for processing reads from a sequencing run.

    But from the main README, I don't know what tools are run in this pipeline. From reading more documentation, I understand that the pipeline can be customized, but a description of the example pipeline would be nice because that is probably the first starting point for a new user.

    The figure looks nice but it seems somewhat abstract. Is it supposed to describe the pipeline? If yes, I would expect arrows connecting the pieces. I assume the paper airplane means it's sending emails out? Or generates a report and sends it somewhere? I assume this because I've read the whole README multiple times but if I was new to this repo, it would seem unrelated.

  3. The documentation is scattered. I agree that it's nice to have a README that's not 20 miles long, but none of the shorter sections in the README are self-explanatory, which means I have to click through to the "extended" documentation. I end up with a single tab open for every item in the docs/ folder. Markdown is notoriously bad for organizing larger pieces of documentation, so a "read the docs" setup might be a nice addition.

Anyways, these are just some thoughts that popped into my head as I was trying to navigate the repo. Feel free to ignore 🙂

lskatz commented 4 years ago

Thank you @lfaller ! We think that you had an excellent point that the documentation needs to be upfront on what it does specifically and so we added a Main steps section.

A large overhaul of the documentation might be difficult at this stage but hopefully that addresses at least, in a little part, your points 2 and 3.

Your point 1 is well taken that going through an example is really difficult to find in the documentation. To that end, we have added a quick start and documentation on the main script itself. While not ideal, it glues together important pieces of documentation so that it is not scattered, as you correctly pointed out.

lskatz commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

lskatz commented 4 years ago

This new PDF just reflects the co-first author labels. https://github.com/lskatz/SneakerNet/commit/0b0b84e182820dc20652a289a533e84c3a934009

lpantano commented 4 years ago

Thank you @lfaller, and @lskatz for the quick reply. @erinyoung, can you take a look and update your review? thanks!

lpantano commented 4 years ago

hey @erinyoung, could you give your update,please? Thanks!

erinyoung commented 3 years ago

I still cannot get the test to work. I was hoping that our MiSeqs would be hooked up so I could test it as directly as suggested by the documentation.

The last thing that I tried (today)

I pulled a new docker image and ran the following:

$ docker run --rm -v $PWD:/data -v $KRAKEN_DEFAULT_DB:/kraken-database -u $(id -u):$(id -g) lskatz/sneakernet:latest 

And the following was printed to my screen:

SneakerNetPlugins.pl --numcpus 8 t/M00123-18-001-test
sn_parseSampleSheet.pl: Found ./samples.tsv.  Not reparsing without --force.
addReadMetrics.pl: Tempdir is /tmp/addReadMetrics.pl._bgSBL
assembleAll.pl: Temporary directory is at /tmp/assembleAll.pl.HVx6A5
assembleAll.pl: ASSEMBLE SAMPLE LT2
assembleAll.pl: PREDICT GENES FOR SAMPLE LT2
assembleAll.pl: ASSEMBLE SAMPLE Philadelphia_CDC
assembleAll.pl: PREDICT GENES FOR SAMPLE Philadelphia_CDC
assembleAll.pl: ASSEMBLE SAMPLE 2010EL-1786
assembleAll.pl: PREDICT GENES FOR SAMPLE 2010EL-1786
assembleAll.pl: ASSEMBLE SAMPLE FA1090
assembleAll.pl: PREDICT GENES FOR SAMPLE FA1090
assembleAll.pl: ASSEMBLE SAMPLE contaminated
assembleAll.pl: PREDICT GENES FOR SAMPLE contaminated
assembleAll.pl: Running metrics on the genbank files at ./SneakerNet/forEmail/assemblyMetrics.tsv
assembleAll.pl: gbk metrics for ./SneakerNet/assemblies/2010EL-1786/2010EL-1786.shovill.skesa.gbk
assembleAll.pl: gbk metrics for ./SneakerNet/assemblies/contaminated/contaminated.shovill.skesa.gbk
assembleAll.pl: gbk metrics for ./SneakerNet/assemblies/FA1090/FA1090.shovill.skesa.gbk
assembleAll.pl: gbk metrics for ./SneakerNet/assemblies/LT2/LT2.shovill.skesa.gbk
assembleAll.pl: gbk metrics for ./SneakerNet/assemblies/Philadelphia_CDC/Philadelphia_CDC.shovill.skesa.gbk
assembleAll.pl: asm metrics for ./SneakerNet/assemblies/FA1090/FA1090.shovill.skesa.fasta
assembleAll.pl: asm metrics for ./SneakerNet/assemblies/2010EL-1786/2010EL-1786.shovill.skesa.fasta
assembleAll.pl: asm metrics for ./SneakerNet/assemblies/Philadelphia_CDC/Philadelphia_CDC.shovill.skesa.fasta
assembleAll.pl: asm metrics for ./SneakerNet/assemblies/contaminated/contaminated.shovill.skesa.fasta
assembleAll.pl: asm metrics for ./SneakerNet/assemblies/LT2/LT2.shovill.skesa.fasta
assembleAll.pl: Effective coverage for ./SneakerNet/assemblies/2010EL-1786/2010EL-1786.shovill.skesa.bam
assembleAll.pl: Effective coverage for ./SneakerNet/assemblies/FA1090/FA1090.shovill.skesa.bam
assembleAll.pl: Effective coverage for ./SneakerNet/assemblies/Philadelphia_CDC/Philadelphia_CDC.shovill.skesa.bam
assembleAll.pl: Effective coverage for ./SneakerNet/assemblies/LT2/LT2.shovill.skesa.bam
assembleAll.pl: Effective coverage for ./SneakerNet/assemblies/contaminated/contaminated.shovill.skesa.bam
assembleAll.pl: Metrics can be found in ./SneakerNet/forEmail/assemblyMetrics.tsv
sn_mlst.pl: Temporary directory is at /tmp/sn_mlst.pl.APv9me
sn_mlst.pl: Reading sample tsv at ./samples.tsv
sn_mlst.pl: MLST for contaminated
sn_mlst.pl: MLST for FA1090
sn_mlst.pl: MLST for 2010EL-1786
sn_mlst.pl: MLST for Philadelphia_CDC
sn_mlst.pl: MLST for LT2
sn_kraken.pl: Running Kraken on 2010EL-1786
sn_kraken.pl:   Database: /kraken-database
sn_kraken.pl: Running Kraken on contaminated
sn_kraken.pl:   Database: /kraken-database
sn_kraken.pl: Running Kraken on FA1090
sn_kraken.pl:   Database: /kraken-database
sn_kraken.pl: Running Kraken on LT2
sn_kraken.pl:   Database: /kraken-database
sn_kraken.pl: Running Kraken on Philadelphia_CDC
sn_kraken.pl:   Database: /kraken-database
sn_detectContamination-kraken.pl: Running Kraken on contaminated
sn_detectContamination-kraken.pl:   Database: /kraken-database
sn_detectContamination-kraken.pl: ./SneakerNet/kraken/contaminated
sn_detectContamination-kraken.pl: Including for email: ./SneakerNet/kraken/contaminated/report.html
sn_detectContamination-kraken.pl: Running Kraken on LT2
sn_detectContamination-kraken.pl:   Database: /kraken-database
sn_detectContamination-kraken.pl: ./SneakerNet/kraken/LT2
sn_detectContamination-kraken.pl: Including for email: ./SneakerNet/kraken/LT2/report.html
sn_detectContamination-kraken.pl: Running Kraken on FA1090
sn_detectContamination-kraken.pl:   Database: /kraken-database
sn_detectContamination-kraken.pl: ./SneakerNet/kraken/FA1090
sn_detectContamination-kraken.pl: Including for email: ./SneakerNet/kraken/FA1090/report.html
sn_detectContamination-kraken.pl: Running Kraken on 2010EL-1786
sn_detectContamination-kraken.pl:   Database: /kraken-database
sn_detectContamination-kraken.pl: ./SneakerNet/kraken/2010EL-1786
sn_detectContamination-kraken.pl: Including for email: ./SneakerNet/kraken/2010EL-1786/report.html
sn_detectContamination-kraken.pl: Running Kraken on Philadelphia_CDC
sn_detectContamination-kraken.pl:   Database: /kraken-database
sn_detectContamination-kraken.pl: ./SneakerNet/kraken/Philadelphia_CDC
sn_detectContamination-kraken.pl: Including for email: ./SneakerNet/kraken/Philadelphia_CDC/report.html
sn_detectContamination-mlst.pl: --mlstfasta was not given, but I set it anyway to /mlst-2.19.0/bin/../db/blast/mlst.fa
sn_detectContamination-mlst.pl: Filtering /mlst-2.19.0/bin/../db/blast/mlst.fa
sn_detectContamination-mlst.pl: Running colorid workflow
ERROR: while creating ./SneakerNet/colorid/PE.tsv: this file does not exist: /home/eriny/src/SneakerNet/t/M00123-18-001-test/FA1090_1.fastq.gz at /SneakerNet-0.14.0/scripts/../SneakerNet.plugins/sn_detectContamination-mlst.pl line 139.
SneakerNetPlugins.pl: ERROR with plugin sn_detectContamination-mlst.pl: ERROR running command
  /SneakerNet-0.14.0/scripts/../SneakerNet.plugins/sn_detectContamination-mlst.pl . --numcpus 8 at /SneakerNet-0.14.0/scripts/SneakerNetPlugins.pl line 99.

Use of uninitialized value $_[0] in substitution (s///) at /usr/share/perl/5.26/File/Basename.pm line 180.
fileparse(): need a valid pathname at /SneakerNet-0.14.0/scripts/SneakerNetPlugins.pl line 109.
lskatz commented 3 years ago

Hi @erinyoung, we were able to replicate the error. Could you try the same thing but with these options?

--no email --no transfer --no save 

We were able to avoid the error with our command like so inside of Docker

SneakerNetPlugins.pl --numcpus 16 --no email --no transfer --no save M00123-18-001-test/

It turns out that some of the functionality is difficult to get into the container and so we acknowledge that some of the plugins do not work as intended on the container interface.

lpantano commented 3 years ago

@erinyoung can you try that to solve your error? Thanks

erinyoung commented 3 years ago

I was able to run the test case, and it looks like it ran fine.

erinyoung commented 3 years ago

Now I want to be a pain and run it on my data. I haven't been able to connect directly to my MiSeq. It seems like sneakernet can be run on fastq files and organize them appropriately. Here's what I tried:

docker run --rm -v $PWD:/data -v $KRAKENDB:/kraken-database -u $(id -u):$(id -g) lskatz/sneakernet:latest SneakerNet.roRun.pl --createsamplesheet -o M70330-20-001-test --numcpus 8 fastq --tempdir .

And got the following output

SneakerNet.roRun.pl: ERROR: no interop files were found in fastq
SneakerNet.roRun.pl: demultiplex sample sheets were not found. Creating one directly from fastq filenames into ./runData/samples.tsv
SneakerNet.roRun.pl: snok.txt not found. I will not read from it.
SneakerNet.roRun.pl: cp fastq/2020CK-00360-UT-M70330-200910_S8_L001_R1_001.fastq.gz to ./runData/2020CK-00360-UT-M70330-200910_S8_L001_R1_001.fastq.gz
ERROR: could not write to ./runData/2020CK-00360-UT-M70330-200910_S8_L001_R1_001.fastq.gz: Permission denied at /SneakerNet-0.14.0/scripts/SneakerNet.roRun.pl line 207.

And the following files :

$ ls runData/
2020CK-00360-UT-M70330-200910_S8_L001_R1_001.fastq.gz  SampleSheet.csv  samples.tsv  snok.txt

It seems like it's trying to copy over my fastq R1 twice. SampleSheet.csv, samples.tsv, and snok.txt are empty.

Well...

$ cat runData/SampleSheet.csv
[Data]
lskatz commented 3 years ago

This is cool @erinyoung ! We are happy it worked on the test data! And it is really cool that you are trying it on your data. @kapsakcj (our excellent containers expert on the team!) tried but could not replicate this error. Although this issue is real, it unfortunately makes it so that we have to speculate on the solution.

I believe but do not know for sure that ./runData might be pointing to a relative path in the container instead of from the root, ie, /runData. Also with the weirdness of directory paths and container paths, I think but do not know for sure, avoiding --tempdir might help clear up another variable here. One more thought though along those lines is that maybe Docker needs to have the tmpdir explicitly given? For example: docker run -v /tmp:/tmp ... might be helpful.

And again, this is speculation and not likely, but just to help us remove any other variables in this issue, do you have read/write access to the input directory?

It seems like it's trying to copy over my fastq R1 twice. SampleSheet.csv, samples.tsv, and snok.txt are empty.

I think but don't know for sure that what you're seeing in the log might be where it copies to a temp space and then back to the output folder. This is as intended, because if a file is in the output directory then the pipeline knows that it is done with that file. In other words, a file's presence in the output folder is a marker for "done."

erinyoung commented 3 years ago

I do have read/write access, but I cannot copy over a file just to overwrite it. There needs to be a bit of time between the two.

I'll change the temp directory to see if that helps any.

erinyoung commented 3 years ago
docker run --rm -v $PWD:/data -v $KRAKENDB:/kraken-database -v /tmp:/tmp -u $(id -u):$(id -g) lskatz/sneakernet:latest SneakerNet.roRun.pl --createsamplesheet -o M70330-20-001-test --numcpus 8 fastq

Still gives me the same error.

ERROR: could not write to /tmp/SneakerNet.roRun.pl.KvbPif/runData/2020CK-00360-UT-M70330-200910_S8_L001_R1_001.fastq.gz: Permission denied at /SneakerNet-0.14.0/scripts/SneakerNet.roRun.pl line 207.

This time there is no file. I thought that maybe the directory /tmp/SneakerNet.roRun.pl.KvbPif/runData might need to be created first, so I tried that and still got the same error.

BTW, this is a different than before. In the prior instance, the file would get copied over. I would still get the same error message. Now there is no file getting copied over.

lskatz commented 3 years ago

@lpantano, @lfaller, @erinyoung

Thanks all for participating in the review of our manuscript. The reviewer comments are insightful and have helped strengthen our software significantly. Over time, we addressed many documentation issues which made the software much better; we have also addressed the multiple dependency issues by creating a container; and after the container became unwieldy, we spent significant time creating a multistage build container (something we had never done before).

We were wondering if we could propose a debugging session with Dr. Young as we are unable to replicate her current error, and we believe some kind of interactive session could help clear things up more quickly / facilitate the review process. We’re excited about what we’ve built with your terrific feedback, and because of the length of time to uncover and resolve these software challenges we thought it might help us move forward with getting to a point where this work could be published. Let us know what you think of this proposal, or if there is another way you would like to approach this.