Open mnm678 opened 5 years ago
Hey, sorry for the late response!
I attempted to replicate through rrtest
, and thought that initially it was a problem with subprocess.Popen
hanging as a result of the scrapy
process never returning a status code to terminate its parent. However, running this with just rr
yielded:
$ rr record -n scrapy runspider test.py
rr: Saving execution to trace directory `/home/crashsim/.local/share/rr/scrapy-1'.
2018-12-03 20:39:26 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: scrapybot)
2018-12-03 20:39:26 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.12
(default, Nov 12 2018, 14:36:49) - [GCC 5.4.0 20160609], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-131-generic-i686-with-Ubuntu-16.04-xenial
Traceback (most recent call last):
File "/home/crashsim/.local/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/home/crashsim/.local/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/home/crashsim/.local/lib/python2.7/site-packages/scrapy/cmdline.py", line 90, in _run_print_help
func(*a, **kw)
File "/home/crashsim/.local/lib/python2.7/site-packages/scrapy/cmdline.py", line 157, in _run_command
cmd.run(args, opts)
File "/home/crashsim/.local/lib/python2.7/site-packages/scrapy/commands/runspider.py", line 80, in run
module = _import_file(filename)
File "/home/crashsim/.local/lib/python2.7/site-packages/scrapy/commands/runspider.py", line 21, in _import_file
module = import_module(fname)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/home/crashsim/test.py", line 14
return 0
SyntaxError: 'return' with argument inside generator
Judging from the traceback, it seems that scrapy itself was spawning off another process, and rr
was unfortunately unable to record execution of that child process. I believe this is the corresponding limitation from their project website:
cannot record processes that share memory with processes outside the recording tree. This is an inherent feature of the design. rr automatically disables features such as X shared memory for recorded processes to avoid this problem.
(@pkmoore @alyptik thoughts?)
With that said, if it possible to instantiate scrapy as a single process (i.e directly create the caller in binary without the need for the scrapy runspider
command), that might be the next step to try.
I wasn't aware of this limitation but it makes sense. I think this is something we need to document but I don't think its on us to fix it in rr.
When making a test for Python's scrapy library, the rrtest create command runs forever. It looks like it's getting stuck in Python's subprocess library.
Here is the command
rrtest create --name scrapy --command "scrapy runspider test.py"
And the contents oftest.py
: