Closed jpmckinney closed 2 years ago
I'm able to reproduce locally. The message starts with "Job 1, 'pytest' terminated by signal " followed by:
It occurs inconsistently. When testing only pytest -v tests/commands/test_compile.py
, I have gotten it to fail on:
It likely fails on any *_package*
test. This case is special, in that it sets package['records']
to a generator, which has special handling in util.SerializableGenerator
to exhaust the generator.
If I disable that behavior by changing streaming=False
, I now get:
json/encoder.py", line 260 in __encode_dict
from json_dumps, or _pydecimal.py", line 522 in __new__
from cli/commands/base.py", line 104 in itemsargparse.py", line 204 in __init__
--disable-vcr
to avoid any possibility of this)It might have to do with garbage collection and yield
: https://www.pypy.org/compat.html and https://doc.pypy.org/en/latest/cpython_differences.html#differences-related-to-garbage-collection-strategies
The particular points at which it fails might be red herrings.
This patch replaces ijson with json, and avoids the package['records']
generator. Some tests fail (because json can't read concatenated JSON), but there are no segfaults, etc.
So, the issue is probably some interaction with ijson. Either ijson has an issue, or our use of ijson is not safe (maybe more likely than the former).
diff --git a/ocdskit/cli/commands/base.py b/ocdskit/cli/commands/base.py
index 3c58ba1..7f81d88 100644
--- a/ocdskit/cli/commands/base.py
+++ b/ocdskit/cli/commands/base.py
@@ -2,6 +2,7 @@ import os
import sys
from abc import ABC, abstractmethod
+import json
import ijson
from ocdskit.util import iterencode, json_dumps
@@ -62,8 +63,10 @@ class BaseCommand(ABC):
"""
Yields the items in the input.
"""
- file = StandardInputReader(self.args.encoding)
- yield from ijson.items(file, self.prefix(), multiple_values=True, **kwargs)
+ for line in sys.stdin:
+ yield json.loads(line, **kwargs)
+ # file = StandardInputReader(self.args.encoding)
+ # yield from ijson.items(file, self.prefix(), multiple_values=True, **kwargs)
def print(self, data, streaming=False):
"""
diff --git a/ocdskit/cli/commands/compile.py b/ocdskit/cli/commands/compile.py
index d08eedf..b6d73dc 100644
--- a/ocdskit/cli/commands/compile.py
+++ b/ocdskit/cli/commands/compile.py
@@ -1,4 +1,5 @@
import logging
+import platform
import sys
import ocdskit.packager
@@ -36,9 +37,16 @@ class Command(OCDSCommand):
logger.warning('sqlite3 is unavailable, so the command will run in memory. If input files are too large, '
'the command might exceed available memory.')
+ if platform.python_implementation() == 'PyPy':
+ merge_streaming = False
+ print_streaming = False
+ else:
+ merge_streaming = True
+ print_streaming = self.args.package
+
try:
- for output in merge(self.items(), streaming=True, **kwargs):
- self.print(output, streaming=self.args.package)
+ for output in merge(self.items(), streaming=merge_streaming, **kwargs):
+ self.print(output, streaming=print_streaming)
except MissingOcidKeyError as e:
raise CommandError('The `ocid` field of at least one release is missing.') from e
except InconsistentVersionError as e:
I think I need a much better understanding of generators and PyPy to resolve this. Given that OCDS Kit uses a lot of generators, I'm also not sure how much speed-up PyPy can offer (it didn't in older versions at least).
Since most OCDS tools depend on OCDS Kit, it would be nice to support PyPy, so that the other tools can run on PyPy. However, for now, I'll abandon the project.
I took the strategy of disabling different parts of the code: print(), json_dump(), iterencode(), etc. The errors persisted. It then occurred to me to use a different ijson backend. Avoiding the yajl2_c backend seems to be working in #179.
I encountered various forms of core dumps on GitHub Actions: "Segmentation fault", "Abort trap: 6", etc.
Some things I addressed:
However, just running
pytest
also causes a "Segmentation fault". The traceback in https://github.com/open-contracting/ocdskit/runs/3082882014?check_suite_focus=true starts with jsonref, whose tests are currently failing on all versions, and whose CI is configured to allow failures on PyPy: https://github.com/gazpachoking/jsonref (jsonref seems unmaintained, which is concerning.)Anyway, I don't think the tests will pass on PyPy, but if we restore it, see 21548ea