It would seem that after filtering out one of the duplicate requests, the request.meta.setdefault(HS_PARENT_ID_KEY) should somehow be copied into the other request (code ref).
Reproducible example:
class ParentSpider(scrapy.Spider):
name = "parent"
def start_requests(self):
yield scrapy.Request(
url="https://books.toscrape.com",
callback=self.parse_nav,
)
def parse_nav(self, response: DummyResponse, navigation: ProductNavigation):
for request in navigation.items:
yield request.to_scrapy(
callback=self.parse_item,
)
def parse_item(self, response: DummyResponse, product: Product):
yield product
Currently, the requests coming from
scrapy_zyte_api.providers.ZyteApiProvider
doesn't create the Parent Request # field in Scrapy Cloud.In the example above, Request 1 should have a Parent Request # field which is missing.
Note that when reverting the changes from the PR https://github.com/scrapinghub/scrapinghub-entrypoint-scrapy/pull/73/, we get the Parent Request # field back which comes from the other request which is filtered in the new scrapinghub-entrypoint-scrapy version.
It would seem that after filtering out one of the duplicate requests, the
request.meta.setdefault(HS_PARENT_ID_KEY)
should somehow be copied into the other request (code ref).Reproducible example: