scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
51.16k stars 10.35k forks source link

[MRG +1] bpython support (followup on #270) #1100

Closed nyov closed 8 years ago

nyov commented 9 years ago

This is a rebase and revamp of #270 I added support for setting the console/interpreter from scrapy.cfg, hopefully this might make it mergeable now.

Couldn't find a good place to document this in the docs, so I did it in code :P :P :P

nramirezuy commented 9 years ago

is it possible to add tests? and can we use a environment variable?

nyov commented 9 years ago

Good question. If anyone has an idea for an env var for this, and how that should be checked from the code, please share.

I'm a bit stupid here. What should a test for this look like?

nramirezuy commented 9 years ago

@nyov we need to be sure it is respecting the default order and if you implement the env var it is actually overriding that. I guess you can use this, run some dummy or version command and validate the output.

nyov commented 9 years ago

Yeah but the question is - what output to validate. No?

The only thing that changes is the interactive shell. Even if we had an interactive shell in the testsuite, how would the output differ if it went through IPython instead of bpython or plain python? That's what I'm not figuring here.

nramirezuy commented 9 years ago

/cc @kmike do you know how we can achieve this?

kmike commented 9 years ago

Regarding tests: one way to test it is to refactor the code. Create functions _embed_ipython_shell, _embed_bpython_shell, _embed_standard_shell and a function get_shell_embed_func which chooses which shell to embed, based on configuration and import errors (it may return one of these start functions). Then test only get_shell_embed_func.

I'm fine with merging this feature without tests though.

To clarify: I haven't checked this PR.

nyov commented 9 years ago

This seems sensible, I have tried to make it that way. I hope it's okay to modify the function signature of start_python_console. The current noipython argument seems unused by any scrapy code, so I replaced it with a list of shells to try.

nramirezuy commented 9 years ago

It is taking better form. Can we decorate the _embed_* functions instead of using that test thingy :tongue:

On _embed_standard_shell if readline isn't present we are assuming that rlcompleter is present.

Don't forget to add documentation.

nyov commented 9 years ago

Right, the test thingy is ugly. I also don't know if NotConfigured is a good choice of exception to raise here, a different name such as StopTest might make more sense. But a decorator could only wrap the function, I don't see how it might inject an early return. How would the decorator here look like in your mind?

On _embed_standard_shell if readline isn't present we are assuming that rlcompleter is present.

try...else says: "if we don't have an exception on importing readline, go on and import rlcompleter". So we only import rlcompleter if readline support was present in the first place. ... Ah I get it now. You meant to say I have an indent failure in the code there. Why don't you just say so.

Don't forget to add documentation.

I was trying to! Actually I probably won't, unless someone provides a rough draft (what to put where in the docs). While I don't mind writing docs, I'm reluctant to invest that much unpaid time in someone else's feature request, upfront. (Don't particularly care for it myself, just trying to take down some longstanding open PRs.) Thank you for understanding.

nramirezuy commented 9 years ago

Doc 4th paragraph I don't think explaining why bpython is important. Just with adding a link, mentioning we detect it and the order this is done; would be enough.

Decorator:

>>> def _embed_regex(regex, text):
...     import re
...     @wraps(_embed_regex)
...     def wrapper(regex=regex, text=text):
...         return re.findall(regex, text)
...     return wrapper
... 
>>> func = _embed_regex(r'a', 'abc')
>>> func.__name__
'_embed_regex'
>>> func()
['a']
>>> func(r'b')
['b']
# ---
>>> def _embed_fail():
...     import nonexistent
...     @wraps(_embed_fail)
...     def wrapper():
...         return
...     return wrapper
... 
>>> func = _embed_fail()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "<console>", line 2, in _embed_fail
ImportError: No module named nonexistent
nyov commented 9 years ago

Awwright, added some docs! Though it seems I fail with the decorators, can still not figure out how to apply them in this context. I'm awfully sorry. :(

nramirezuy commented 9 years ago
def _embed_ipython_shell(namespace={}, banner='', **kw):
    """Start an IPython Shell"""
    try:
        from IPython.terminal.embed import InteractiveShellEmbed
        from IPython.terminal.ipapp import load_default_config
    except ImportError:
        from IPython.frontend.terminal.embed import InteractiveShellEmbed
        from IPython.frontend.terminal.ipapp import load_default_config

    @wraps(_embed_ipython_shell)
    def wrapper(namespace=namespace, banner='', **kw):
        config = load_default_config()
        shell = InteractiveShellEmbed(banner1=banner, user_ns=namespace, config=config)
        return shell()
    return wrapper

Then get_shell_embed_func will continue if ImportError and should return _f on try: else:

nyov commented 9 years ago

Ah! Thank you so much. In my mind I thought I needed control-flow back after the wrapper, big fail. I see it now, obviously. So neat.

This one is on me :beers:

Anything else I failed?

kmike commented 9 years ago

@nramirezuy likes clever tricks :)

nyov commented 9 years ago

Let me know if passing known_shells like this sounds good, whether to rather pass it along in start_python_console, or not at all? Thanks.

And how does the environment variable sound? SCRAPY_PYTHON_SHELL or rather just SCRAPY_SHELL maybe? SCRAPY_CONSOLE?

nyov commented 9 years ago

Okay, squashing this now.

nramirezuy commented 8 years ago

Naming the kwarg and the constant the same feels so dangerous :tongue:

Maybe something like this?

KNOWN_SHELLS = {...}

def f(shells, known_shells=None):
    if known_shells is None:
        known_shells = KNOWN_SHELLS.copy()

I don't know about environment variables; thoughts @kmike ?

nyov commented 8 years ago

Oh, this is a mistake from moving known_shells = {} outside get_shell_embed_func. How about just uppercasing KNOWN_SHELLS = {} and def get_shell_embed_func(shells, known_shells=KNOWN_SHELLS): - why the .copy()?

barraponto commented 8 years ago

@nyov we usually avoid mutable default arguments due to this: http://docs.python-guide.org/en/latest/writing/gotchas/#mutable-default-arguments

nyov commented 8 years ago

Okay, OrderedDict seemed simpler than using tuples :) Here's the update, added two tests, which depend on bpython and IPython though (let me know if these are overkill and to rather not install those in the test env?)

cyberplant commented 8 years ago

Looks good, and works good in Mac OS X with python 2.7.6 and pypy 2.6.0.

However.. the screen is being cleared at the start and I'm not being able to see the scrapy banner. Can we prevent that happening?

Tomorrow will be the sprints at EuroPython 2015 and bpython and scrapy will be there, so maybe I can ask them about that :smile:

nyov commented 8 years ago

Nice, hope to see some video coverage of the event later.

To be honest, I didn't give much thought about bpython's screen clearing, but it does make things a bit inconvenient. I couldn't find anything from a quick look about how to maybe pull in custom text into bpython's window at start. I wonder if the shell could be started before loading the crawler and attaching it to Shell() later, but can't test that now. Maybe it'd be okay to leave that for a different PR, though? This has been hanging around since early 2013 and, well, I'd hope to see it merged some time or having it rejected on a clear note.

codecov-io commented 8 years ago

Current coverage is 82.19%

Merging #1100 into master will increase coverage by +0.02% as of d6223c7

@@            master   #1100   diff @@
======================================
  Files          165     165       
  Stmts         8153    8193    +40
  Branches      1134    1140     +6
  Methods          0       0       
======================================
+ Hit           6699    6734    +35
- Partial        263     264     +1
- Missed        1191    1195     +4

Review entire Coverage Diff as of d6223c7

Powered by Codecov. Updated on successful CI builds.

nramirezuy commented 8 years ago

I don't know if you have to add more test; so the bot smiles at you. :+1:

cyberplant commented 8 years ago

I didn't know the correct way to add a commit to a foreign PR, so maybe I did it wrong, sorry!

I did another PR (#1444) that takes the help that was being manually printed on console and builds a 'banner' for passing it to the different shells. Tested on standard python, ipython and bpython and works as expected.

nyov commented 8 years ago

@nramirezuy , yeah I knew that would come once I saw the bot :( And another blocker. Wellll, I'll put it on the todo list. Soon(TM)

@cyberplant, thanks that looks nice! Want me to pull the commit in here, or do you want to keep going with the whole thing in your PR?

cyberplant commented 8 years ago

As you wish.. for me, it's the same.

nyov commented 8 years ago

If you're willing to see it through, I'll be glad to let you have it. (And you have the shorter wire to the merge decision people, looks like, so that's an advantage ;)

Too bad I can't hand over the PR or point it at your branch, so yours will be the followup on the followup. :tada:

cyberplant commented 8 years ago

Understood. :+1:

So.. good people, come with me, the party continues here :point_right: #1444