webrecorder / browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container
https://crawler.docs.browsertrix.com
GNU Affero General Public License v3.0
658 stars 83 forks source link

replay of streaming services GUI #537

Open aaegid opened 7 months ago

aaegid commented 7 months ago

A minor bug perhaps? Replay calls the wrong asset for replay of Netflix GUI. In this case Chestnut Man series overlay replaced by whiplash movie that does not belong on the page. Whiplash assets were crawled from the main 'browse' site. Also, and more importantly, no videos were crawled from the overlay for the chosen series. Where there is a screenshot from the Whiplash movie there should have been a short video trailer for the Chestnut Man series. The third image shows error message when clicking play in replay. If I understand correctly, the crawler does not initiate playback (click play) and therefore I cannot attempt the replay of playback? (cf. my feature request about some sort of DRM support #530 ) billede

ikreymer commented 7 months ago

It's very hard to tell what sites you're crawling and with what settings. It sounds like it might be a replay issue, which would be better be addressed in the replayweb.page repo. You can also try capturing the same site with the ArchiveWeb.page extension and seeing if the videos playback if you hit the play button. To be able to look at this further, we'll need more clear repro instructions.

aaegid commented 7 months ago

Browsertrix crawl and Webarchive.page crawl do not automatically mouse over the elements on the page that show overlays with series video trailers and episode information. I have tried to show this issue in two screen recording of the archives. No audio or video is available in the archive. The result is a little better with archiveweb.page where I did a manual mouse over the Kastanemanden tv-series and prompted the overlay with information and links to episodes. Hope this is better info on my issue.

https://github.com/webrecorder/browsertrix-crawler/assets/166002134/490cc39c-ca40-4eb1-9fcb-5ba8036f9f29

https://github.com/webrecorder/browsertrix-crawler/assets/166002134/bfa54d20-ba21-4838-9f2c-69d8eec8944f