tomasnorre / crawler

Libraries and scripts for crawling the TYPO3 page tree. Used for re-caching, re-indexing, publishing applications etc.
GNU General Public License v3.0
54 stars 84 forks source link

Placeholder for follow up improvements for issue #754 #816

Closed tomasnorre closed 2 weeks ago

tomasnorre commented 3 years ago

From @brotkrueml comments in review: https://github.com/tomasnorre/crawler/pull/754#issuecomment-864586872

I have extracted the comments into separate issues, to ease the fixes and keep the PRs smaller.


Now a URL with MP is used:

Processing
https://website.ddev.site:8443/en/?MP= () => 
OK: 

This shouldn't be an issue, as the correct canonical is used (without MP), just a little bit unaesthetic.

Cannot reproduce this anymore. @tomasnorre


Running the command without depth:

ddev t3cmd crawler:buildQueue 3 deployment --mode exec                                                                                                                                

Executing 4 requests right away:
[20.06.21 17:19] https://website.ddev.site:8443/en/?MP= (URL already existed)<br>[20.06.21 17:19] https://website.ddev.site:8443/de/?MP= (URL already existed)<br>[20.06.21 17:19] https://website.ddev.site:8443/pl/?MP= (URL already existed)<br>[20.06.21 17:19] https://website.ddev.site:8443/tr/?MP= (URL already existed)

omits the detailled information from above for other pages. The <br> tag should be converted to a new line on console.

Edit: I cannot reproduce this as of crawler 12.4.0 @tomasnorre


I am getting many empty lines when calling a buildQueue command with depth. Perhaps these empty lines come from "successful" pages without any output. I think, they should be avoided.

Edit: Fixed as part of #1097 @tomasnorre

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tomasnorre commented 3 weeks ago

Now a URL with MP is used:

Processing
https://website.ddev.site:8443/en/?MP= () => 
OK: 

This shouldn't be an issue, as the correct canonical is used (without MP), just a little bit unaesthetic.

@brotkrueml I know it's a long time ago, but do you recall how to reproduce this? I cannot get it reproduced.

brotkrueml commented 3 weeks ago

Sorry, no. But if you can't reproduce it, maybe that is gone? :-)

tomasnorre commented 3 weeks ago

Thanks for your feedback, didn't expect that to be honest either. I'll see if I cannot reproduce it in near future, I'll expect it to be solved until it gets reported again.

tomasnorre commented 2 weeks ago

I'll close this issue for now, as all issues, that still needs to be address is addressed in a new issue.