scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
52.82k stars 10.53k forks source link

Why do I have to use Cookies in Request? #5363

Closed lime-n closed 2 years ago

lime-n commented 2 years ago

I had a problem earlier when trying to scrape a site. I thought there was an issue with requests elsewhere but It turned out that I had to include the cookies from the headers as an argument in scrapy.FormRequest(). What is the reasoning behind this? Because, when using request.post() I can get a response 200 by just using the payload and headers. Why does scrapy differ to this for this situation? I had thought it follows the same structure as requests in the backend, but it seems like there's more to it.

For example, here's what I had which gives a 404 error:

import scrapy

headers = {
    'authority': 'www.etsy.com',
    'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
    'x-csrf-token': '3:1641383062:Exn8HMFDcc0UtitU6NOM3o3x8BGB:864dc90d926383d90686f37be56f69685b939f0f306b10a99bcd9016209f15d4',
    'sec-ch-ua-mobile': '?0',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'accept': '*/*',
    'x-requested-with': 'XMLHttpRequest',
    'x-page-guid': 'eeda48b359a.aa23cce28f31baac6f24.00',
    'x-detected-locale': 'GBP|en-GB|GB',
    'sec-ch-ua-platform': '"Linux"',
    'origin': 'https://www.etsy.com',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.etsy.com/search/clothing/womens-clothing?q=20s&explicit=1&ship_to=GB&page=2&ref=pagination',
    'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
    'cookie': 'uaid=G-_aWcvXqYHevnNO3ane9nOUmwNjZACCxCuVe2B0tVJpYmaKkpVSaVpUSoBZaGZVQL6Lj4mRv7ObrmmRR3F-aLyHp1ItAwA.; user_prefs=bNwL2wOEkWxqOSu2A1-CWlR6cr9jZACCxCuVe2B0tJK7U4CSTl5pTo6OUmqerruTko4SiACLGEEoXEQsAwA.; fve=1641314748.0; utm_lps=google__cpc; ua=531227642bc86f3b5fd7103a0c0b4fd6; p=eyJnZHByX3RwIjoxLCJnZHByX3AiOjF9; _gcl_au=1.1.1757627174.1641314793; _gid=GA1.2.1898390797.1641314793; __adal_cw=1641314793715; _pin_unauth=dWlkPVltVmtZemxoTldNdFpURXdPQzAwWkRWbUxXRTJOV1l0TTJGaE9URXdZVEEwTlRBeQ; last_browse_page=https%3A%2F%2Fwww.etsy.com%2Fuk%2F; __adal_ses=*; __adal_ca=so%3DGoogle%26me%3Dorganic%26ca%3D%28not%2520set%29%26co%3D%28not%2520set%29%26ke%3D%28not%2520set%29; search_options={"prev_search_term":"20s","item_language":null,"language_carousel":null}; _ga=GA1.2.559839679.1641314793; tsd=%7B%7D; __adal_id=952d43d7-5b80-4907-99d7-6f6baa9f4fe1.1641314794.3.1641383063.1641383059.2fe7a338-93bd-441f-b295-80549adbef7b; _tq_id.TV-27270909-1.a4d5=e2f6af8c27dee5e4.1641314794.0.1641383063..; _uetsid=dff577e06d7d11ec9617cbf4cc51b5b2; _uetvid=dff5f2706d7d11ec932fd3c5b816ab20; granify.uuid=bfd14e46-e8fa-4e7b-bce7-6f05dcb4b215; pla_spr=1; _ga_KR3J610VYM=GS1.1.1641383058.3.1.1641383118.60; exp_hangover=qk2fpkLi1lphuLsCKeq4gAe9BvxjZACCxCuVe8D01Zbb1UrlqUnxiUUlmWmZyZmJOfE5iSWpecmV8YUm8UYGhpZKVkqZeak5memZSTmpSrUMAA..; granify.session.QrsCf=-1',
}

class EtsySpider(scrapy.Spider):
    name = 'etit'
    start_urls = ['https://www.etsy.com/api/v3/ajax/bespoke/member/neu/specs/async_search_results']

    custom_settings = {
        'USER_AGENT':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'
    }
    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.FormRequest(
                url,
                method = "POST",
                formdata = {
                    'log_performance_metrics': 'true',
                    'specs[async_search_results][]': 'Search2_ApiSpecs_WebSearch',
                    'specs[async_search_results][1][search_request_params][detected_locale][language]': 'en-GB',
                    'specs[async_search_results][1][search_request_params][detected_locale][currency_code]': 'GBP',
                    'specs[async_search_results][1][search_request_params][detected_locale][region]': 'GB',
                    'specs[async_search_results][1][search_request_params][locale][language]': 'en-GB',
                    'specs[async_search_results][1][search_request_params][locale][currency_code]': 'GBP',
                    'specs[async_search_results][1][search_request_params][locale][region]': 'GB',
                    'specs[async_search_results][1][search_request_params][name_map][query]': 'q',
                    'specs[async_search_results][1][search_request_params][name_map][query_type]': 'qt',
                    'specs[async_search_results][1][search_request_params][name_map][results_per_page]': 'result_count',
                    'specs[async_search_results][1][search_request_params][name_map][min_price]': 'min',
                    'specs[async_search_results][1][search_request_params][name_map][max_price]': 'max',
                    'specs[async_search_results][1][search_request_params][parameters][q]': '30s',
                    'specs[async_search_results][1][search_request_params][parameters][explicit]': '1',
                    'specs[async_search_results][1][search_request_params][parameters][locationQuery]': '2635167',
                    'specs[async_search_results][1][search_request_params][parameters][ship_to]': 'GB',
                    'specs[async_search_results][1][search_request_params][parameters][page]': '4',
                    'specs[async_search_results][1][search_request_params][parameters][ref]': 'pagination',
                    'specs[async_search_results][1][search_request_params][parameters][facet]': 'clothing/womens-clothing',
                    'specs[async_search_results][1][search_request_params][parameters][referrer]': 'https://www.etsy.com/search/clothing/womens-clothing?q=30s&explicit=1locationQuery=2635167&ship_to=GB&page=3&ref=pagination',
                    'specs[async_search_results][1][search_request_params][user_id]': '',
                    'specs[async_search_results][1][request_type]': 'pagination_preact',
                    'specs[async_search_results][1][is_eligible_for_spa_reformulations]': 'false',
                    'view_data_event_name': 'search_async_pagination_specview_rendered'
                },
                headers=headers,
                callback = self.parse
            )

    def parse(self, response):
        stuff = response.json().get('cssFiles')
        yield {
            'stuff':stuff
        }

And when I include the following I can crawl the information:

 cookies = {
            "user_prefs": "2sjEL59UUglDjNIW6TKc04MvLTVjZACCxJMbvsPoaKXQYBclnbzSnBwdpdQ83dBgJR2lUEeoiBGEwkXEMgAA",
            "fve": "1640607991.0",
            "ua": "531227642bc86f3b5fd7103a0c0b4fd6",
            "_gcl_au": "1.1.717562651.1640607992",
            "uaid": "E7bYwrWVwTy7YGe_b_ipYT3Avd9jZACCxJMbvoPpqwvzqpVKEzNTlKyUnLJ9Io3DTQt1k53MwiojXTLzvZPCS31yCoPC_JRqGQA.",
            "pla_spr": "0",
            "_gid": "GA1.2.1425785976.1641390447",
            "_dc_gtm_UA-2409779-1": "1",
            "_pin_unauth": "dWlkPU0yVTRaamxoTWpjdFlqTTVZUzAwT0RJeExXRmpNamt0WlROalpXTTVNREE0WkRVNQ",
            "_ga": "GA1.1.1730759327.1640607993",
            "_uetsid": "052ece906e2e11ecb56a0390ed629376",
            "_uetvid": "39de7550671011ec80d2dbfaa05c901b",
            "exp_hangover": "pB4zSokzfzMIT9Jzi7zIwmXybCJjZACCxJMbvoPpqwt7qpXKU5PiE4tKMtMykzMTc-JzEktS85Ir4wtN4o0MDC2VrJQy81JzMtMzk3JSlWoZAA..",
            "_ga_KR3J610VYM": "GS1.1.1641390446.2.1.1641390474.32"

        }

        for url in self.start_urls:
            yield scrapy.FormRequest(
                'https://www.etsy.com/api/v3/ajax/bespoke/member/neu/specs/async_search_results',
                headers=headers,
                cookies=cookies,
                method="POST",
                formdata=data,
                callback = self.parse_res
            )
SiddharthaAnand commented 2 years ago

Websites are designed to be user-friendly and not crawler-friendly. There can be a whole new lecture on why cookies are required but in short: All of the internet works on something called HTTP Protocol. This protocol is designed to be stateless which means the server will treat every incoming request as a new request. By design, it does not remember who you are. In order to make complex applications we have to make sure that the server DOES remember who you are. There are quite a number of ways in which the server is capable of remembering you. One of those ways is to use Cookies in the Request. That is the reason in some cases, you will need to simulate your Requests having Cookies in your Request header as well.

Gallaecio commented 2 years ago

I had to include the cookies from the headers as an argument in scrapy.FormRequest(). […] when using request.post() I can get a response 200 by just using the payload and headers.

This sounds like something to look at, but you would have to provide a minimal reproducible example, written both with Scrapy and requests (but the same example in terms of input and expected output) to allow us to have a proper look.

However, mind that I find it hard to believe Scrapy is at fault here. While working on providing those examples, I suspect you will find the actual root cause of the failed response to be something other than “Scrapy needs cookies, requests does not”.

lime-n commented 2 years ago

I had to include the cookies from the headers as an argument in scrapy.FormRequest(). […] when using request.post() I can get a response 200 by just using the payload and headers.

This sounds like something to look at, but you would have to provide a minimal reproducible example, written both with Scrapy and requests (but the same example in terms of input and expected output) to allow us to have a proper look.

However, mind that I find it hard to believe Scrapy is at fault here. While working on providing those examples, I suspect you will find the actual root cause of the failed response to be something other than “Scrapy needs cookies, requests does not”.

I hope the first script I provided proves as 'minimal response', the bulk of the code is really just the headers and payload. Besides that, the rest is simple as I'm only sending a request and checking if I get a response. So it could really just be copied and runned. It should work as is - the issue I get is a response 400 which I wouldn't be able to figure out. Though it looks like a scrapy issue - can you check?

Here's the code for requests:

import requests

headers = {
    'authority': 'www.etsy.com',
    'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
    'x-csrf-token': '3:1641383062:Exn8HMFDcc0UtitU6NOM3o3x8BGB:864dc90d926383d90686f37be56f69685b939f0f306b10a99bcd9016209f15d4',
    'sec-ch-ua-mobile': '?0',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'accept': '*/*',
    'x-requested-with': 'XMLHttpRequest',
    'x-page-guid': 'eeda48b359a.aa23cce28f31baac6f24.00',
    'x-detected-locale': 'GBP|en-GB|GB',
    'sec-ch-ua-platform': '"Linux"',
    'origin': 'https://www.etsy.com',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.etsy.com/search/clothing/womens-clothing?q=20s&explicit=1&ship_to=GB&page=2&ref=pagination',
    'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
    'cookie': 'uaid=G-_aWcvXqYHevnNO3ane9nOUmwNjZACCxCuVe2B0tVJpYmaKkpVSaVpUSoBZaGZVQL6Lj4mRv7ObrmmRR3F-aLyHp1ItAwA.; user_prefs=bNwL2wOEkWxqOSu2A1-CWlR6cr9jZACCxCuVe2B0tJK7U4CSTl5pTo6OUmqerruTko4SiACLGEEoXEQsAwA.; fve=1641314748.0; utm_lps=google__cpc; ua=531227642bc86f3b5fd7103a0c0b4fd6; p=eyJnZHByX3RwIjoxLCJnZHByX3AiOjF9; _gcl_au=1.1.1757627174.1641314793; _gid=GA1.2.1898390797.1641314793; __adal_cw=1641314793715; _pin_unauth=dWlkPVltVmtZemxoTldNdFpURXdPQzAwWkRWbUxXRTJOV1l0TTJGaE9URXdZVEEwTlRBeQ; last_browse_page=https%3A%2F%2Fwww.etsy.com%2Fuk%2F; __adal_ses=*; __adal_ca=so%3DGoogle%26me%3Dorganic%26ca%3D%28not%2520set%29%26co%3D%28not%2520set%29%26ke%3D%28not%2520set%29; search_options={"prev_search_term":"20s","item_language":null,"language_carousel":null}; _ga=GA1.2.559839679.1641314793; tsd=%7B%7D; __adal_id=952d43d7-5b80-4907-99d7-6f6baa9f4fe1.1641314794.3.1641383063.1641383059.2fe7a338-93bd-441f-b295-80549adbef7b; _tq_id.TV-27270909-1.a4d5=e2f6af8c27dee5e4.1641314794.0.1641383063..; _uetsid=dff577e06d7d11ec9617cbf4cc51b5b2; _uetvid=dff5f2706d7d11ec932fd3c5b816ab20; granify.uuid=bfd14e46-e8fa-4e7b-bce7-6f05dcb4b215; pla_spr=1; _ga_KR3J610VYM=GS1.1.1641383058.3.1.1641383118.60; exp_hangover=qk2fpkLi1lphuLsCKeq4gAe9BvxjZACCxCuVe8D01Zbb1UrlqUnxiUUlmWmZyZmJOfE5iSWpecmV8YUm8UYGhpZKVkqZeak5memZSTmpSrUMAA..; granify.session.QrsCf=-1',
}

data = {
  'log_performance_metrics': 'true',
  'specs[async_search_results][]': 'Search2_ApiSpecs_WebSearch',
  'specs[async_search_results][1][search_request_params][detected_locale][language]': 'en-GB',
  'specs[async_search_results][1][search_request_params][detected_locale][currency_code]': 'GBP',
  'specs[async_search_results][1][search_request_params][detected_locale][region]': 'GB',
  'specs[async_search_results][1][search_request_params][locale][language]': 'en-GB',
  'specs[async_search_results][1][search_request_params][locale][currency_code]': 'GBP',
  'specs[async_search_results][1][search_request_params][locale][region]': 'GB',
  'specs[async_search_results][1][search_request_params][name_map][query]': 'q',
  'specs[async_search_results][1][search_request_params][name_map][query_type]': 'qt',
  'specs[async_search_results][1][search_request_params][name_map][results_per_page]': 'result_count',
  'specs[async_search_results][1][search_request_params][name_map][min_price]': 'min',
  'specs[async_search_results][1][search_request_params][name_map][max_price]': 'max',
  'specs[async_search_results][1][search_request_params][parameters][q]': '20s',
  'specs[async_search_results][1][search_request_params][parameters][explicit]': '1',
  'specs[async_search_results][1][search_request_params][parameters][ship_to]': 'GB',
  'specs[async_search_results][1][search_request_params][parameters][page]': '2',
  'specs[async_search_results][1][search_request_params][parameters][ref]': 'pagination',
  'specs[async_search_results][1][search_request_params][parameters][facet]': 'clothing/womens-clothing',
  'specs[async_search_results][1][search_request_params][parameters][referrer]': 'https://www.etsy.com/search/clothing/womens-clothing?q=20s&explicit=1&ship_to=GB',
  'specs[async_search_results][1][search_request_params][user_id]': '',
  'specs[async_search_results][1][request_type]': 'pagination_preact',
  'specs[async_search_results][1][is_eligible_for_spa_reformulations]': 'false',
  'view_data_event_name': 'search_async_pagination_specview_rendered'
}

requests.post('https://www.etsy.com/api/v3/ajax/bespoke/member/neu/specs/async_search_results', headers=headers, data=data)
Gallaecio commented 2 years ago

Your example cannot count as minimal. For one, it uses Splash.

Please, provide an example with only the minimum code needed to reproduce the issue. For the sake of the project, it is better for reporters to spend time on that, so that maintainers can focus their time on solving issues, not on trying to reproduce them.

lime-n commented 2 years ago

Here's the bare minimum:

import scrapy

headers = {
    'authority': 'www.etsy.com',
    'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
    'x-csrf-token': '3:1641383062:Exn8HMFDcc0UtitU6NOM3o3x8BGB:864dc90d926383d90686f37be56f69685b939f0f306b10a99bcd9016209f15d4',
    'sec-ch-ua-mobile': '?0',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'accept': '*/*',
    'x-requested-with': 'XMLHttpRequest',
    'x-page-guid': 'eeda48b359a.aa23cce28f31baac6f24.00',
    'x-detected-locale': 'GBP|en-GB|GB',
    'sec-ch-ua-platform': '"Linux"',
    'origin': 'https://www.etsy.com',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.etsy.com/search/clothing/womens-clothing?q=20s&explicit=1&ship_to=GB&page=2&ref=pagination',
    'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
    'cookie': 'uaid=G-_aWcvXqYHevnNO3ane9nOUmwNjZACCxCuVe2B0tVJpYmaKkpVSaVpUSoBZaGZVQL6Lj4mRv7ObrmmRR3F-aLyHp1ItAwA.; user_prefs=bNwL2wOEkWxqOSu2A1-CWlR6cr9jZACCxCuVe2B0tJK7U4CSTl5pTo6OUmqerruTko4SiACLGEEoXEQsAwA.; fve=1641314748.0; utm_lps=google__cpc; ua=531227642bc86f3b5fd7103a0c0b4fd6; p=eyJnZHByX3RwIjoxLCJnZHByX3AiOjF9; _gcl_au=1.1.1757627174.1641314793; _gid=GA1.2.1898390797.1641314793; __adal_cw=1641314793715; _pin_unauth=dWlkPVltVmtZemxoTldNdFpURXdPQzAwWkRWbUxXRTJOV1l0TTJGaE9URXdZVEEwTlRBeQ; last_browse_page=https%3A%2F%2Fwww.etsy.com%2Fuk%2F; __adal_ses=*; __adal_ca=so%3DGoogle%26me%3Dorganic%26ca%3D%28not%2520set%29%26co%3D%28not%2520set%29%26ke%3D%28not%2520set%29; search_options={"prev_search_term":"20s","item_language":null,"language_carousel":null}; _ga=GA1.2.559839679.1641314793; tsd=%7B%7D; __adal_id=952d43d7-5b80-4907-99d7-6f6baa9f4fe1.1641314794.3.1641383063.1641383059.2fe7a338-93bd-441f-b295-80549adbef7b; _tq_id.TV-27270909-1.a4d5=e2f6af8c27dee5e4.1641314794.0.1641383063..; _uetsid=dff577e06d7d11ec9617cbf4cc51b5b2; _uetvid=dff5f2706d7d11ec932fd3c5b816ab20; granify.uuid=bfd14e46-e8fa-4e7b-bce7-6f05dcb4b215; pla_spr=1; _ga_KR3J610VYM=GS1.1.1641383058.3.1.1641383118.60; exp_hangover=qk2fpkLi1lphuLsCKeq4gAe9BvxjZACCxCuVe8D01Zbb1UrlqUnxiUUlmWmZyZmJOfE5iSWpecmV8YUm8UYGhpZKVkqZeak5memZSTmpSrUMAA..; granify.session.QrsCf=-1',
}

payload = {
                    'log_performance_metrics': 'true',
                    'specs[async_search_results][]': 'Search2_ApiSpecs_WebSearch',
                    'specs[async_search_results][1][search_request_params][detected_locale][language]': 'en-GB',
                    'specs[async_search_results][1][search_request_params][detected_locale][currency_code]': 'GBP',
                    'specs[async_search_results][1][search_request_params][detected_locale][region]': 'GB',
                    'specs[async_search_results][1][search_request_params][locale][language]': 'en-GB',
                    'specs[async_search_results][1][search_request_params][locale][currency_code]': 'GBP',
                    'specs[async_search_results][1][search_request_params][locale][region]': 'GB',
                    'specs[async_search_results][1][search_request_params][name_map][query]': 'q',
                    'specs[async_search_results][1][search_request_params][name_map][query_type]': 'qt',
                    'specs[async_search_results][1][search_request_params][name_map][results_per_page]': 'result_count',
                    'specs[async_search_results][1][search_request_params][name_map][min_price]': 'min',
                    'specs[async_search_results][1][search_request_params][name_map][max_price]': 'max',
                    'specs[async_search_results][1][search_request_params][parameters][q]': '30s',
                    'specs[async_search_results][1][search_request_params][parameters][explicit]': '1',
                    'specs[async_search_results][1][search_request_params][parameters][locationQuery]': '2635167',
                    'specs[async_search_results][1][search_request_params][parameters][ship_to]': 'GB',
                    'specs[async_search_results][1][search_request_params][parameters][page]': '4',
                    'specs[async_search_results][1][search_request_params][parameters][ref]': 'pagination',
                    'specs[async_search_results][1][search_request_params][parameters][facet]': 'clothing/womens-clothing',
                    'specs[async_search_results][1][search_request_params][parameters][referrer]': 'https://www.etsy.com/search/clothing/womens-clothing?q=30s&explicit=1locationQuery=2635167&ship_to=GB&page=3&ref=pagination',
                    'specs[async_search_results][1][search_request_params][user_id]': '',
                    'specs[async_search_results][1][request_type]': 'pagination_preact',
                    'specs[async_search_results][1][is_eligible_for_spa_reformulations]': 'false',
                    'view_data_event_name': 'search_async_pagination_specview_rendered'
                }

class EtsySpider(scrapy.Spider):
    name = 'fails'

    def start_requests(self):
        yield scrapy.FormRequest(
            url = 'https://www.etsy.com/api/v3/ajax/bespoke/member/neu/specs/async_search_results',
            method = "POST",
            formdata = payload,
            headers=headers,
            callback = self.parse
            )

Output:

Crawled (400) <POST https://www.etsy.com/api/v3/ajax/bespoke/member/neu/specs/async_search_results> (referer: https://www.etsy.com/search/clothing/womens-clothing?q=20s&explicit=1&ship_to=GB&page=2&ref=pagination)

Here's the cookies to check that it works after:

cookies = {
            "user_prefs": "2sjEL59UUglDjNIW6TKc04MvLTVjZACCxJMbvsPoaKXQYBclnbzSnBwdpdQ83dBgJR2lUEeoiBGEwkXEMgAA",
            "fve": "1640607991.0",
            "ua": "531227642bc86f3b5fd7103a0c0b4fd6",
            "_gcl_au": "1.1.717562651.1640607992",
            "uaid": "E7bYwrWVwTy7YGe_b_ipYT3Avd9jZACCxJMbvoPpqwvzqpVKEzNTlKyUnLJ9Io3DTQt1k53MwiojXTLzvZPCS31yCoPC_JRqGQA.",
            "pla_spr": "0",
            "_gid": "GA1.2.1425785976.1641390447",
            "_dc_gtm_UA-2409779-1": "1",
            "_pin_unauth": "dWlkPU0yVTRaamxoTWpjdFlqTTVZUzAwT0RJeExXRmpNamt0WlROalpXTTVNREE0WkRVNQ",
            "_ga": "GA1.1.1730759327.1640607993",
            "_uetsid": "052ece906e2e11ecb56a0390ed629376",
            "_uetvid": "39de7550671011ec80d2dbfaa05c901b",
            "exp_hangover": "pB4zSokzfzMIT9Jzi7zIwmXybCJjZACCxJMbvoPpqwt7qpXKU5PiE4tKMtMykzMTc-JzEktS85Ir4wtN4o0MDC2VrJQy81JzMtMzk3JSlWoZAA..",
            "_ga_KR3J610VYM": "GS1.1.1641390446.2.1.1641390474.32"

        }

The first script is pretty much the same as the requests so this should work as 'bare-minimum', as it's only the import of scrapy, the headers, payload and start_requests. I hope this serves you well!

Gallaecio commented 2 years ago

I have just realized that you are sending the cookies in the headers in requests.

So the problem is that in Scrapy the cookies do not work if you set them through the headers parameter, but do work if set through the cookies parameter?

lime-n commented 2 years ago

Yes, that's the issue I was facing. I couldn't understand why request accept the cookies in the headers but scrapy does not.

elacuesta commented 2 years ago

Closing as duplicate of #1992. Long story short, we fixed it on #2400 but that introduced another bug (#4717), so we reverted it. And by "we" I mean myself, I'm the one to blame :sweat_smile: We do mention it in the docs, though: Screenshot at 2022-01-14 16-35-33