tomasnorre / crawler

Libraries and scripts for crawling the TYPO3 page tree. Used for re-caching, re-indexing, publishing applications etc.
GNU General Public License v3.0
54 stars 84 forks source link

crawler:buildQueue fails with Error at offset 0 of 92 bytes #1087

Open hacksch opened 1 month ago

hacksch commented 1 month ago

Bug Report

Current Behavior

i execute the following command to build a queue and crawl the found pages. "typo3cms crawler:buildQueue --depth 3 --mode exec 1 default" After list of the found urls i get

Processing

0/299 [>---------------------------] 0% Error at offset 0 of 92 bytes

Expected behavior/output Execution should not fail

Steps to reproduce

Prepare a configuration execute "typo3cms crawler:buildQueue --depth 3 --mode exec 1 default"

Environment

Possible Solution The error appiers in JsonCompatibilityConverter line 39.

The dataString for unserialize in my behavior is {"url":"https:\/\/www.domain.de\/en\/home.html","procInstructions":[""],"procInstrParams":[]}

This is not valid for unserialize. When i comment the lines 39-49 the execution works.

ulrichmathes commented 1 month ago

Your dataString looks totally fine, can be json_decoded and is an array afterwards which will be returned before unserialize take place.

I managed to see a convert call with an empty $dataString. This is not an array and throws the warning in unserialize.

I'm not familiar with this whole process but maybe we should:

  1. fail fast when $dataString is empty
  2. do not use try catch on unserialize as it does not throw anything but emitted a warning (since PHP 8.3)

Steps to reproduce: Crawler log -> Log -> Reload List

Should I prepare one or two pull requests?

tomasnorre commented 1 month ago

@ulrichmathes If you have a suggested fix in mind, I would be happy to review a PR.

hacksch commented 1 month ago

Hello, i will check your fix on monday. I'm not sure if this will help in my case i described and had debugged. In my case the string was not empty and i was a error which stopped the process an was not a warning

hacksch commented 1 month ago

And what i remember now is that i tried to unserialize the string directly via unserialize() and could reproduce the error. Without any further code. Maybe a check if the string contains serialized data could be a solution for the described problem, else the lines 39-49 will skipped. What do you think?