Open ibnesayeed opened 5 years ago
Retaining the most recent index in the config, as we did before but I am unsure we still do, would make this process have a failsafe in the case where replay quits abruptly before the admin has a chance to save (it will retain a reference). This saved value could be offered as a basis index for a subsequent replay invocation.
Config file (if in effect) only retains the reference to the file, which is not really important, because at the time of restart it can be manually supplied. The issue is when the index grew in the temp directory and then machine rebooted. The file will be gone, even if the reference remains. In fact I am against keeping the reference in the config file implicitly as it can cause some real confusion.
Ah, ok, so the objective is to prevent indexes in /tmp/
from automatically being purged.
I was thinking that you wanted the admin to have the ability to easily restart ipwb replay with the previously used index in the case that the system crashed (regardless of whether it's in /tmp/
) or for ease of restarting the last session. The latter would occur when the index file last used is different than the one they previously invoked the system due to uploading of WARCs -- a sort of "continue where you left off" instead of having to 1. Find the updated CDXJ file or 2. Re-uploading all WARCs in the web interface.
The intent is to allow people to not worry too much about all the flags and configs when running the process except those who want a fine control on how things work. The workflow I can imagine is that layman users only need to run one command ipwb replay
, potentially with no other parameters. The the server is up, now they can go ahead and upload their WARC files using drag drop from the admin interface and the replay is up an running. They can gradually add more stuff which will slowly grow the internal index which they did not touch from the command line. The server might be running on a remote machine. Then from the admin interface they can download the index every once in a while for back up. To utilize the full potential, perhaps we also need to provide a way to allow uploading index from the admin interface which can amend or replace (a checkbox to indicate the intent) existing index.
While we already have the path (e.g., /tmp/717386A20Z3G.cdxj
) to the updated index file displayed in the replay homepage admin interface, we do not yet link it. While this could be done fairly easily, we may also want to make available the link within the replay banner extended interface.
That would be a big no for me. The banner, both in FAB and extended mode is meant for users. Adding admin options there is not a good idea. We need to segregate administrative options and do not allow them to leak into the public view.
I agree to separate the admin options and user options. Perhaps an admin view in the banner to access those options? Having to go to a separate interface from what your viewing introduces a disconnect. Most users will be hosting and viewing their own archives without necessarily having external users of their replay system.
I still don't think it's a good idea because:
Now that we have the ability to dynamically evolve the index as more and more WARCs are uploaded (as per #436), it will be handy to allow admins to download the current state of the index for backup and restoration of the replay (index stored in temp can go away any moment when the system goes down).