oduwsdl / ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
MIT License
602 stars 40 forks source link

Add the ability to download current state of the index file from the admin interface #506

Open ibnesayeed opened 5 years ago

ibnesayeed commented 5 years ago

Now that we have the ability to dynamically evolve the index as more and more WARCs are uploaded (as per #436), it will be handy to allow admins to download the current state of the index for backup and restoration of the replay (index stored in temp can go away any moment when the system goes down).

machawk1 commented 5 years ago

Retaining the most recent index in the config, as we did before but I am unsure we still do, would make this process have a failsafe in the case where replay quits abruptly before the admin has a chance to save (it will retain a reference). This saved value could be offered as a basis index for a subsequent replay invocation.

ibnesayeed commented 5 years ago

Config file (if in effect) only retains the reference to the file, which is not really important, because at the time of restart it can be manually supplied. The issue is when the index grew in the temp directory and then machine rebooted. The file will be gone, even if the reference remains. In fact I am against keeping the reference in the config file implicitly as it can cause some real confusion.

machawk1 commented 5 years ago

Ah, ok, so the objective is to prevent indexes in /tmp/ from automatically being purged.

I was thinking that you wanted the admin to have the ability to easily restart ipwb replay with the previously used index in the case that the system crashed (regardless of whether it's in /tmp/) or for ease of restarting the last session. The latter would occur when the index file last used is different than the one they previously invoked the system due to uploading of WARCs -- a sort of "continue where you left off" instead of having to 1. Find the updated CDXJ file or 2. Re-uploading all WARCs in the web interface.

ibnesayeed commented 5 years ago

The intent is to allow people to not worry too much about all the flags and configs when running the process except those who want a fine control on how things work. The workflow I can imagine is that layman users only need to run one command ipwb replay, potentially with no other parameters. The the server is up, now they can go ahead and upload their WARC files using drag drop from the admin interface and the replay is up an running. They can gradually add more stuff which will slowly grow the internal index which they did not touch from the command line. The server might be running on a remote machine. Then from the admin interface they can download the index every once in a while for back up. To utilize the full potential, perhaps we also need to provide a way to allow uploading index from the admin interface which can amend or replace (a checkbox to indicate the intent) existing index.

machawk1 commented 5 years ago

While we already have the path (e.g., /tmp/717386A20Z3G.cdxj) to the updated index file displayed in the replay homepage admin interface, we do not yet link it. While this could be done fairly easily, we may also want to make available the link within the replay banner extended interface.

ibnesayeed commented 5 years ago

That would be a big no for me. The banner, both in FAB and extended mode is meant for users. Adding admin options there is not a good idea. We need to segregate administrative options and do not allow them to leak into the public view.

machawk1 commented 5 years ago

I agree to separate the admin options and user options. Perhaps an admin view in the banner to access those options? Having to go to a separate interface from what your viewing introduces a disconnect. Most users will be hosting and viewing their own archives without necessarily having external users of their replay system.

ibnesayeed commented 5 years ago

I still don't think it's a good idea because: