webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1.37k stars 214 forks source link

Recorded website missing external media (from different domain) #693

Open claudiobizzotto opened 2 years ago

claudiobizzotto commented 2 years ago

Describe the bug

(Not sure if this is a bug or a feature request.) The web recorder doesn't seem to save local copies of media, such as images, whose sources are on a different domain than the original website being archived.

Steps to reproduce the bug

wb-manager init my-web-archive
wayback --record --live -a --auto-interval 10

I then open a web browser at http://localhost:8080/my-web-archive/record/<url-to-be-recorded>.

Expected behavior

I would expect a copy of each media file (in this case, each image), regardless of origin (same domain or different domain), to be available locally.

Environment

ikreymer commented 2 years ago

Yes, that is the intended, when you load http://localhost:8080/my-web-archive/record/<url-to-be-recorded> in the browser, all the URLs loaded on the page, regardless of their origin, should be recorded.

Can't really say more about what went wrong without looking at the particular URL or the WARCs