kiwicom / pytest-recording

A pytest plugin that allows recording network interactions via VCR.py
MIT License
425 stars 34 forks source link

Doesn't seem to work with requests inside a Twisted Reactor process #50

Closed nomasprime closed 4 years ago

nomasprime commented 4 years ago

Hi, I'm new to Python so apologies in advance if this is obvious or I don't explain things very well.

For a learning project I'm writing a web scraper with tests. In the following example it works as expected with the commented out line but requests inside the CrawlerProcess/Twisted Reactor aren't being picked up.

    @pytest.mark.vcr
    def test_parse(self):
        # assert requests.get("http://httpbin.org/ip").text == '{"ip": true}'
        BaseSpider.start_urls = ['thoughtbot']

        BaseSpider.custom_settings = {
            'ROBOTSTEXT_OBEY': False
        }

        process = CrawlerProcess()
        process.crawl(BaseSpider)
        process.start()

I'm not sure if this is a bug or I'm just doing something wrong?

Stranger6667 commented 4 years ago

Hi!

There could be multiple reasons for such behavior. To untackle what is going on let's start from the environment. What versions of python, vcrpy, pytest-recording, Twisted and (I assume that it is the case) scrapy are you using? You can get packages info from the pip freeze output. Also, what OS are you using? If e.g. it is Windows and CrawlingProcess actually spawns a new process, then the new process will not have things applied by VCR-py and we can look into this, I'll try to reproduce it locally once I'll have more info about the environment

nomasprime commented 4 years ago

Thanks @Stranger6667, really appreciate your help.

I'm on MacOS 15.15.5 with Python 3.8.3.

pip output:

vcrpy==4.0.2
pytest==5.4.3
pytest-recording==0.8.1
Twisted==20.3.0
Scrapy==2.2.0

I saw the VCR compatibility doc and wondered if maybe Scrapy isn't compatibly?

nomasprime commented 4 years ago

Looks like Scrapy uses twisted.web.

Stranger6667 commented 4 years ago

Unfortunately, VCR doesn't support twisted.web. I see a couple of things we can do about it:

In any case, pytest-recording will automatically get it working, once it will be implemented on the vcrpy side, which I think is the cleanest way to record/replay HTTP for twisted.web

nomasprime commented 4 years ago

I've raised the issue with VCRpy and I'll look into, assume you meant, HTTPretty.

Also found this answer on StackOverflow which talks about simply using Scrapy's built-in cache.

I'll have a play around with both.

Thanks @Stranger6667.

Stranger6667 commented 4 years ago

@nomasprime yep, I meant HTTPretty

Thanks @Stranger6667.

You are very welcome! :)