webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1.34k stars 207 forks source link

How does custom filtering for the recorder work? Could I use it to filter out MP4 files? (and other video file extensions)? #837

Open YousufSSyed opened 1 year ago

YousufSSyed commented 1 year ago

I don't want Pywb to automatically record video files. I found this page on filter: https://pywb.readthedocs.io/en/latest/manual/recorder.html?highlight=filter#custom-filtering, but it doesn't list specifics and it says "For a more detailed examples, please consult the tests in pywb.recorder.test.test_recorder." what or where is pywb.recorder.test.test_recorder?

If I did filter out videos like MP4s, could I still record MP4s if I clicked on the video for it to play?

tw4l commented 1 year ago

what or where is pywb.recorder.test.test_recorder.?

The file the document is referring to is here: https://github.com/webrecorder/pywb/blob/main/pywb/recorder/test/test_recorder.py. One of the tests it's referring to is: https://github.com/webrecorder/pywb/blob/main/pywb/recorder/test/test_recorder.py#L202-L225.

Right now this is functionality that's available in the code base but not exposed via pywb's record mode. Perhaps @ikreymer could shed some additional light here, but I don't believe that it's currently possible to e.g. filter out videos by their extension in pywb's record mode.

YousufSSyed commented 1 year ago

OK, but why is it written as pywb.recorder.test.test_recorder? Why doesn't it say something like "pywb/recorder/test/test_recorder.py in the repo [or code]"?

tw4l commented 1 year ago

OK, but why is it written as pywb.recorder.test.test_recorder? Why doesn't it say something like "pywb/recorder/test/test_recorder.py in the repo [or code]"?

Good question! That is the Python packaging/import syntax for that particular test module. Perhaps a link would be more widely accessible.

YousufSSyed commented 1 year ago

You mean if I were to import the package in a Python script?

And are you able to edit Pywb's documentation? Can you make a link to the script? And in other places in the documentation as well? That would be appreciated!