superphy / prairiedog

next-gen pangenome graphs for predictive genomics
Other
0 stars 0 forks source link

Test running in PyPy #76

Closed kevinkle closed 5 years ago

kevinkle commented 5 years ago

Per LemonGraph's reported performance improvements

kevinkle commented 5 years ago

We had a problem with this since PyPy's lookahead implementation for file parsing seems slow (not sure if they implement it at all) and we rely on it for parsing kmers from a file

kevinkle commented 5 years ago

Since we parse and pickle into Kmer objects, we could see about having snakemake do the parsing in cpython and graphing in pypy (if we can find some common conversion format) since the graphing the vast majority of the time

kevinkle commented 5 years ago

The SubgraphRef test has been failing on CircleCI when running via PyPy. When testing locally, we get some weird behaviour when trying to import SubgraphRef that only happens in PyPy.

kevin@panther ~/prairiedog> python
Python 3.6.1 (784b254d669919c872a505b807db8462b6140973, Apr 16 2019, 18:18:28)
[PyPy 7.1.1-beta0 with GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``as usual in pypy, the solution
appears completely disproportionate to the problem and instead we'll go for a
completely different simpler approach to the original problem''
>>>> from prairiedog.subgraph_ref import SubgraphRef
usage:  [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or:  --help [cmd1 cmd2 ...]
   or:  --help-commands
   or:  cmd --help

error: no commands supplied
kevin@panther ~/prairiedog>

This problem is propagated into Snakemake as well:

kevin@panther ~/prairiedog> snakemake
SystemExit in line 16 of /home/kevin/prairiedog/Snakefile:
usage: snakemake [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: snakemake --help [cmd1 cmd2 ...]
   or: snakemake --help-commands
   or: snakemake cmd --help

error: no commands supplied
  File "/home/kevin/prairiedog/Snakefile", line 16, in <module>
  File "/home/kevin/prairiedog/prairiedog/subgraph_ref.py", line 11, in <module>
  File "/home/kevin/prairiedog/prairiedog/lemon_graph.py", line 6, in <module>
  File "/home/kevin/.pyenv/versions/pypy3.6-7.1.1/site-packages/LemonGraph-0.10.0-py3.6.egg/LemonGraph/__init__.py", line 5, in <module>
  File "/home/kevin/prairiedog/setup.py", line 51, in <module>
  File "/home/kevin/.pyenv/versions/pypy3.6-7.1.1/site-packages/setuptools/__init__.py", line 145, in setup
  File "/home/kevin/.pyenv/versions/pypy3.6-7.1.1/lib-python/3/distutils/core.py", line 136, in setup
2019-07-03 10:14:33 panther snakemake.logging[12093] ERROR SystemExit in line 16 of /home/kevin/prairiedog/Snakefile:
usage: snakemake [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: snakemake --help [cmd1 cmd2 ...]
   or: snakemake --help-commands
   or: snakemake cmd --help

error: no commands supplied
  File "/home/kevin/prairiedog/Snakefile", line 16, in <module>
  File "/home/kevin/prairiedog/prairiedog/subgraph_ref.py", line 11, in <module>
  File "/home/kevin/prairiedog/prairiedog/lemon_graph.py", line 6, in <module>
  File "/home/kevin/.pyenv/versions/pypy3.6-7.1.1/site-packages/LemonGraph-0.10.0-py3.6.egg/LemonGraph/__init__.py", line 5, in <module>
  File "/home/kevin/prairiedog/setup.py", line 51, in <module>
  File "/home/kevin/.pyenv/versions/pypy3.6-7.1.1/site-packages/setuptools/__init__.py", line 145, in setup
  File "/home/kevin/.pyenv/versions/pypy3.6-7.1.1/lib-python/3/distutils/core.py", line 136, in setup
kevinkle commented 5 years ago

I think its missing the ffi lib

try:
    from ._lemongraph_cffi import ffi, lib
except ImportError:
    from setup import fetch_external
    fetch_external()

from LemonGraph.init.py where line 5 is from setup import fetch_external

kevinkle commented 5 years ago

The import LemonGraph in PyPy works when were in the lemongraph/ submodule, but not in any other folder

"LemonGraph/__init__.py" 1095L, 37088C written
kevin@panther ~/p/lemongraph> python
Python 3.6.1 (784b254d669919c872a505b807db8462b6140973, Apr 16 2019, 18:18:28)
[PyPy 7.1.1-beta0 with GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``"there should be one and only one
obvious way to do it". PyPy variant: "there can be N half-buggy ways to do
it"''
>>>> import LemonGraph
In file included from /usr/include/assert.h:35,
                 from lib/lemongraph.c:10:
/usr/include/features.h:184:3: warning: #warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-Wcpp]
 # warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE"
   ^~~~~~~
In file included from /usr/include/errno.h:25,
                 from lib/db.c:5:
/usr/include/features.h:184:3: warning: #warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-Wcpp]
 # warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE"
   ^~~~~~~
>>>> exit()
kevin@panther ~/p/lemongraph> python
Python 3.6.1 (784b254d669919c872a505b807db8462b6140973, Apr 16 2019, 18:18:28)
[PyPy 7.1.1-beta0 with GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``pypy HIT generator''
>>>> import LemonGraph
>>>>
kevin@panther ~/p/lemongraph> python
Python 3.6.1 (784b254d669919c872a505b807db8462b6140973, Apr 16 2019, 18:18:28)
[PyPy 7.1.1-beta0 with GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``"it's likely temporary until
forever" arigo''
>>>> import LemonGraph
>>>> exit()
kevin@panther ~/prairiedog> python
Python 3.6.1 (784b254d669919c872a505b807db8462b6140973, Apr 16 2019, 18:18:28)
[PyPy 7.1.1-beta0 with GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``The problem is that for almost
any non-trivial program, it's not clear what 'correct' means.''
>>>> import LemonGraph
usage:  [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or:  --help [cmd1 cmd2 ...]
   or:  --help-commands
   or:  cmd --help

error: no commands supplied
kevinkle commented 5 years ago

Fresh clone and install of lemongraph lets us import in other folders, but not in prairiedog

kevin@panther ~/lemongraph> python
Python 3.5.3 (928a4f70d3de7d17449456946154c5da6e600162, Feb 09 2019, 11:50:43)
[PyPy 7.0.0 with GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> import LemonGraph
In file included from /usr/include/assert.h:35,
                 from lib/lemongraph.c:10:
/usr/include/features.h:184:3: warning: #warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-Wcpp]
 # warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE"
   ^~~~~~~
In file included from /usr/include/errno.h:25,
                 from lib/db.c:5:
/usr/include/features.h:184:3: warning: #warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-Wcpp]
 # warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE"
   ^~~~~~~
>>>> exit()
kevin@panther ~/lemongraph> cd ..
kevin@panther ~> python
Python 3.5.3 (928a4f70d3de7d17449456946154c5da6e600162, Feb 09 2019, 11:50:43)
[PyPy 7.0.0 with GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> import LemonGraph
>>>> exit()
kevinkle commented 5 years ago
kevin@panther ~/prairiedog> python -m pip show LemonGraph
Name: LemonGraph
Version: 0.10.0
Summary: LemonGraph Database
Home-page: https://github.com/NationalSecurityAgency/lemongraph
Author: None
Author-email: None
License: UNKNOWN
Location: /home/kevin/.pyenv/versions/pypy3.6-7.1.1/site-packages/LemonGraph-0.10.0-py3.6.egg
Requires: cffi, lazy, msgpack, pysigset, python-dateutil, six
Required-by:
kevin@panther ~/prairiedog> cd ..
kevin@panther ~> python -m pip show LemonGraph
cpyext: missing slot wrapper tp_as_buffer.c_bf_getreadbuffer
RPython traceback:
  File "pypy_interpreter.c", line 23470, in BuiltinCode_funcrun_obj
  File "pypy_module_cpyext_6.c", line 17261, in wrap_del_call
Fatal RPython error: NotImplementedError
fish: “python -m pip show LemonGraph” terminated by signal SIGABRT (Abort)
kevin@panther ~> pyenv versions
  system
* pypy3 (set by /home/kevin/.python-version)
  pypy3.5-7.0.0
  pypy3.5-7.0.0/envs/pypy3
  pypy3.6-7.1.1
kevin@panther ~> cd lemongraph/
kevin@panther ~/lemongraph> pyenv versions
  system
* pypy3 (set by /home/kevin/.python-version)
  pypy3.5-7.0.0
  pypy3.5-7.0.0/envs/pypy3
  pypy3.6-7.1.1
kevin@panther ~/lemongraph>
kevinkle commented 5 years ago

Ah....

kevin@panther ~/prairiedog> pyenv versions
  system
  pypy3
  pypy3.5-7.0.0
  pypy3.5-7.0.0/envs/pypy3
* pypy3.6-7.1.1 (set by /home/kevin/prairiedog/.python-version)
kevinkle commented 5 years ago

Well, a fresh install of lemongraph fixed the import, but now getting another error. Will also have to note to cleanup lemongraphs folders before each python test set

kevin@panther ~/prairiedog> snakemake
Building DAG of jobs...
2019-07-03 11:02:15 panther snakemake.logging[20426] WARNING Building DAG of jobs...
Using shell: /bin/bash
2019-07-03 11:02:15 panther snakemake.logging[20426] WARNING Using shell: /bin/bash
Provided cores: 1
2019-07-03 11:02:15 panther snakemake.logging[20426] WARNING Provided cores: 1
Rules claiming more threads will be scaled down.
2019-07-03 11:02:15 panther snakemake.logging[20426] WARNING Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        2       kmers
        1       pangenome
        4
2019-07-03 11:02:15 panther snakemake.logging[20426] WARNING Job counts:
        count   jobs
        1       all
        2       kmers
        1       pangenome
        4

2019-07-03 11:02:15 panther snakemake.logging[20426] INFO
[Wed Jul  3 11:02:15 2019]
2019-07-03 11:02:15 panther snakemake.logging[20426] INFO [Wed Jul  3 11:02:15 2019]
rule kmers:
    input: samples/SRR3295722.fasta
    output: outputs/kmers/SRR3295722.pkl
    jobid: 3
    wildcards: sample=SRR3295722
2019-07-03 11:02:15 panther snakemake.logging[20426] INFO rule kmers:
    input: samples/SRR3295722.fasta
    output: outputs/kmers/SRR3295722.pkl
    jobid: 3
    wildcards: sample=SRR3295722

2019-07-03 11:02:15 panther snakemake.logging[20426] INFO
Job counts:
        count   jobs
        1       kmers
        1
2019-07-03 11:02:17 panther snakemake.logging[20473] WARNING Job counts:
        count   jobs
        1       kmers
        1
2019-07-03 11:02:17 panther prairiedog[20473] DEBUG Parsing Kmers for file samples/SRR3295722.fasta with K size 11 in pid 20473
2019-07-03 11:02:17 panther prairiedog[20473] DEBUG Seeing current working directory as: /home/kevin/prairiedog
cpyext: missing slot wrapper tp_as_buffer.c_bf_getreadbuffer
RPython traceback:
  File "pypy_interpreter.c", line 23470, in BuiltinCode_funcrun_obj
  File "pypy_module_cpyext_6.c", line 17261, in wrap_del_call
Fatal RPython error: NotImplementedError
Aborted
Shutting down, this might take some time.
2019-07-03 11:02:18 panther snakemake.logging[20426] WARNING Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
2019-07-03 11:02:18 panther snakemake.logging[20426] ERROR Exiting because a job execution failed. Look above for error message
Complete log: /home/kevin/prairiedog/.snakemake/log/2019-07-03T110213.663901.snakemake.log
2019-07-03 11:02:18 panther snakemake.logging[20426] WARNING Complete log: /home/kevin/prairiedog/.snakemake/log/2019-07-03T110213.663901.snakemake.log
kevinkle commented 5 years ago

The above is https://bitbucket.org/pypy/pypy/issues/3004/fatal-rpython-error-notimplementederror

kevinkle commented 5 years ago

Did a fresh install of lemongraph as described into the pypy3.6-7.1.1 and this fixed the Fatal RPython error: NotImplementedError. Looks like there's some problem with pypy35-7.0.0 atm

kevinkle commented 5 years ago

Everything passed as of https://github.com/superphy/prairiedog/pull/117. Merged into master too as even if we go with Dgraph, it might be faster to do some of the tasks in PyPy