vsoch / scif

scientific filesystem: a filesystem organization for scientific software and metadata
https://sci-f.github.io/
Mozilla Public License 2.0
30 stars 13 forks source link

scif exec parameter parsing fails? #29

Closed fbartusch closed 6 years ago

fbartusch commented 6 years ago

Hi,

I'm trying to combine snakemake with scif and took the snakemake tutorial workflow as a starting point. Now I have a Singularity container with scif and two scif-apps installed (bwa and samtools).

I shell into the container:

singularity shell --bind data:/scif/data snakemake_tutorial.simg

Now I run samtools as scif-app without any parameters -> it prints the samtools help (yeah!)

scif exec samtools samtools

Now I run samtools as scif app, with view as command

scif exec samtools samtools view
[samtools] executing /scif/apps/samtools/bin/samtools view

This is expected, because samtools view prints nothing.

Now I want to convert a sam-file to a bam-file:

scif exec samtools samtools view -Sb mapped_reads/r1_subset.sam > mapped_reads/r1_subset.bam

usage: scif [-h] [--debug] [--quiet] [--writable]
            {version,pyshell,shell,preview,help,install,inspect,run,apps,dump,exec}
            ...
scif: error: unrecognized arguments: -Sb mapped_reads/r1_subset.sam

I don't know if the error is in front of the screen or does scif have problems building the command? It works if I shell into the app and execute the command 'by hand'.

Thanks, Felix

vsoch commented 6 years ago

hey @fbartusch ! I discovered this same bug (and am working on it along with others as we speak!) The issue is that the scif entrypoint uses argparse, and I think the fix is to use the parser.parse_known_args or REMAINDER (see examples here) https://stackoverflow.com/questions/25864774/how-to-read-the-remaining-of-a-command-line-with-argparse. I have quite a bit on my plate and may not get to this today, so if you want to take a first shot it would be greatly appreciated! It should be a relatively quick / easy fix, just needs proper testing.

fbartusch commented 6 years ago

Hi @vsoch, I can try to fix it.

vsoch commented 6 years ago

okay! And I'm going to start testing now too - the other bugs I was working on were easier to squish than I anticipated! Thank you! :dancer:

vsoch commented 6 years ago

okay here is some early testing. The first thing I tried is (a workaround not a solution) and that would be putting the entire argument into a big quote:

docker run vanessa/scif exec hello-world-env "echo [e]OMG && blaa --argy two"

We see the argument gets (correctly) parsed as the command and goes through

Namespace(cmd=['hello-world-env', 'echo [e]OMG && blaa --argy two'], command='exec', debug=False, quiet=False, writable=False)

If we run this with the other method (using parse_unknown_args) it does the same, so this is good as some users might just do this.

Namespace(cmd=['hello-world-env', 'echo [e]OMG && blaa --argy two'], command='exec', debug=False, quiet=False, writable=False)
[]   # this is a print of "unknown"

But now let's try to mimic the bug you get - adding extra arguments (without the parsing of unknown).

docker run vanessa/scif exec hello-world-env "echo [e]OMG && blaa" --argy two
usage: scif [-h] [--debug] [--quiet] [--writable]
            {version,pyshell,shell,preview,help,install,inspect,run,apps,dump,exec}
            ...
scif: error: unrecognized arguments: --argy two

But now when we try to do this with unknown, it passes them through to the second variable

docker run vanessa/scif exec hello-world-env "echo [e]OMG" --argy two

# here are the "known" args
Namespace(cmd=['hello-world-env', 'echo [e]OMG'], command='exec', debug=False, quiet=False, writable=False)
# here are the "unknown"
['--argy', 'two']
[hello-world-env] executing /bin/echo $OMG
TACOS

So we probably just need to decide which commands would have unknown arguments (likely run and exec?) and pass them through to the functions.

vsoch commented 6 years ago

okay I think I have a fix! Lol, let me give you something to play with!

vsoch commented 6 years ago

okay @fbartusch if you want to test this one https://github.com/vsoch/scif/pull/30 that would be great! I'm really glad you have this use case because mine don't go beyond "hello world" and it's a bit limited. Let me know any issues and I can fix right away.

fbartusch commented 6 years ago

Ok, I'll try it. I have a good example at hand.

fbartusch commented 6 years ago

@vsoch

The argument parsing seems to work now. But I have a problem with the second command, although I don't know if the problem is on the scif side.

~/miniconda2/envs/scif_fix/bin/scif exec bwa bwa mem -o /scif/data/mapped_reads/r1_subset.sam /scif/data/index/ref.fa /scif/data/raw_reads/r1_subset.fq

[bwa] executing /scif/apps/bwa/bin/bwa mem -o /scif/data/mapped_reads/r1_subset.sam /scif/data/index/ref.fa /scif/data/raw_reads/r1_subset.fq
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 10000 sequences (2064809 bp)...
[M::mem_process_seqs] Processed 10000 reads in 0.794 CPU sec, 0.799 real sec
[main] Version: 0.7.17-r1188
[main] CMD: /scif/apps/bwa/bin/bwa mem -o /scif/data/mapped_reads/r1_subset.sam /scif/data/index/ref.fa /scif/data/raw_reads/r1_subset.fq
[main] Real time: 0.835 sec; CPU: 0.813 sec
~/miniconda2/envs/scif_fix/bin/scif exec samtools samtools view -Sb /scif/data/mapped_reads/r1_subset.sam > /scif/data/mapped_reads/r1_subset.bam

Traceback (most recent call last):
  File "/home/fbartusch/miniconda2/envs/scif_fix/bin/scif", line 11, in <module>
    load_entry_point('scif==0.0.65', 'console_scripts', 'scif')()
  File "/home/fbartusch/miniconda2/envs/scif_fix/lib/python3.6/site-packages/scif-0.0.65-py3.6.egg/scif/client/__init__.py", line 245, in main
    subparser=subparsers[args.command])
  File "/home/fbartusch/miniconda2/envs/scif_fix/lib/python3.6/site-packages/scif-0.0.65-py3.6.egg/scif/client/execute.py", line 41, in main
    client.execute(app, cmd)
  File "/home/fbartusch/miniconda2/envs/scif_fix/lib/python3.6/site-packages/scif-0.0.65-py3.6.egg/scif/main/commands.py", line 93, in execute
    return self._exec(app)
  File "/home/fbartusch/miniconda2/envs/scif_fix/lib/python3.6/site-packages/scif-0.0.65-py3.6.egg/scif/main/commands.py", line 68, in _exec
    line = process.readline().strip('\n')
  File "/home/fbartusch/miniconda2/envs/scif_fix/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
vsoch commented 6 years ago

could you show me the output of

import locale
locale.getdefaultlocale()
('en_US', 'UTF-8')
vsoch commented 6 years ago

And could you give me a container (or something) so I could reproduce the error? The issue is with the parsing of the command (an issue with codecs) and it would be greatly useful to be able to reproduce it.

fbartusch commented 6 years ago

This is the python inside the container:

Singularity snakemake_tutorial.simg:~/github/snakemake_tutorial> python
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getdefaultlocale()
('en_US', 'UTF-8')
>>> ('en_US', 'UTF-8')
('en_US', 'UTF-8')

I can upload my Singularity file and the test data on GitHub.

vsoch commented 6 years ago

okay great! Here is something to try, given a cmd (a string) that doesn't work in the python environment triggering the bug:

    import locale
    import os

    cmd = "Your command here"
    loc = locale.getdefaultlocale()[1]

    for line in os.popen(cmd): 
        print(line.rstrip().encode(loc))
vsoch commented 6 years ago

okay, just reproduced the error, working on this finally!

vsoch commented 6 years ago

ha, I put it in quotes and it worked - did you try that? I'm going to also see if I can determine what specific character is making the command ornery.

Singularity snakemake:~/Documents/Dropbox/Code/scif/examples/snakemake_tutorial> scif run samtools "view -Sb /scif/data/mapped_reads/r1_subset.sam > /scif/data/mapped_reads/r1_subset.bam"
[samtools] executing /bin/bash /scif/apps/samtools/scif/runscript view -Sb /scif/data/mapped_reads/r1_subset.sam > /scif/data/mapped_reads/r1_subset.bam
Singularity snakemake:~/Documents/Dropbox/Code/scif/examples/snakemake_tutorial> 
vsoch commented 6 years ago

okay, yeah, with some testing I think the issue is that when you do a > (or some kind of pipe of output) it is getting parsed by the host before the internal process. So either the user can put it in quotes, or we (like [e]) can try to put it in quotes for them, or (a cool idea) provide another command line thing "eg [>] to not get eaten up. I'm going to mess with these ideas, will report back!

fbartusch commented 6 years ago

With the quotes you see in at a glance which part is execute in the scif-app and which part not. But then you will also have problems if you want to work with 'real strings' at the same time -> nested quotes.

vsoch commented 6 years ago

okay, it's worth debugging then. I'm going to build this into a Docker container so I can more easily tweak and then re-install scif.

vsoch commented 6 years ago

and just to be specific - the intended / ideal use case is that the output pipe (>) directs to a file inside, or outside the container? Or could it be both? You are giving a path relative to the scif filesystem, so it's implied to be inside. However running this command (without the quotes) would generally parse the pipe, and direct the output of running the container to somewhere external.

So the two cases are:

[scif command]  > [outside]
[scif command]  > [inside]

I think if we wanted the second to work, regardless of the unicode error (to be able to give a path inside) we would need the pipe to not be parsed by the host, perhaps like:

[scif command]  [>] [inside]

a quote would also work, but it sounds like you don't think that's a good solution because of double quotes. I think I agree about that - it would be nice to avoid quote-troubles!