webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1.41k stars 218 forks source link

Programmatic collection creation, export and deletion #435

Open johnknapp opened 5 years ago

johnknapp commented 5 years ago

(Please forgive me if this is the wrong venue for questions such as this. I'm not aware of any other avenue.)

I'm super impressed with pywb and have enjoyed wrapping my head around both the code and the documentation. Our system is successfully recording and playing back and we're quite pleased.

However, we need to programmatically create and delete collections and I've not discovered endpoints for that. If they do not exist, I can create a PR if you want that and can point me in the right direction.

Additionally, we need to programmatically return a warc file to the caller and I've not found that endpoint either. (on that topic, we'd need to determine when the warc is no longer being appended.)

Thanks for your guidance!

N0taN3rd commented 5 years ago

we need to programmatically create and delete collections

Currently pywb, when deployed, only supports dynamic discovery of collections from within the configured collections root directory.

You would need to create collections manually using wb-manager and delete them using os native means.

This could be added by adding an end point to FrontendApp and have it interact with the wb-manager.

programmatically return a warc file to the caller

Currently pywb does not support this.

This could be added by adding end points to both FrontendApp and the WarcServer.

FrontendApp handles and routes all requests when pywb is deployed.

we'd need to determine when the warc is no longer being appended

The RecorderApp is the place you would need to look to for this information.

@ikreymer would be able to give you more incite into the recording process.

johnknapp commented 5 years ago

Thanks @N0taN3rd for the detailed response! Is there a pywb discussion forum, listserv or a preferred channel for questions like these?

N0taN3rd commented 5 years ago

@johnknapp there is not a preferred place for questions like this at the moment other than github.

monstrfolk commented 5 years ago

@johnknapp, I am interested in feature to create collections through api.