roach-php / core

The complete web scraping toolkit for PHP.
https://roach-php.dev
1.36k stars 70 forks source link

How do I access items once all item pipelines are finished? #35

Closed adzay closed 2 years ago

adzay commented 2 years ago

I am running unit tests and i want to get all items scraped into an array. I plan to show the results in a Vue componenent so I need to return the results from a Laravel controller.

I am getting phpUnit logs that the terms have been successfully crawled. However the below results produces a null result.

$customs = Roach::startSpider(CustomSpider::class);

//roach.INFO: Run starting [] []
//roach.INFO: Item scraped {"name":"xxx"} []
//roach.INFO: Item scraped {"name":"xxx2"} []

   foreach ($customs as $cus) {
            dd($cus);
        }

//foreach() argument must be of type array|object, null given

Please help I have read your docs but it talks about handling data within the generator (itemsPipeline), nothing about exporting results.

Thanks

ksassnowski commented 2 years ago

This will be part of the upcoming 1.0 release. A new method Roach::collectSpider(...) will get added that behaves the exact same way as Roach::startSpider(...) except that it will return all scraped items after the run.

// $scrapedItems is an array<int, ItemInterface>
$scrapedItems = Roach::collectSpider(MySpider::class);
ksassnowski commented 2 years ago

This is possible in the 1.0 release. Please check out this section of the docs for more information.