prior-art-archive / priorartarchive.org

Prior Art Archive Site
https://priorartarchive.org
GNU General Public License v2.0
3 stars 1 forks source link

Support title editing #18

Closed metasj closed 5 years ago

metasj commented 5 years ago

Uploaders would like to preserve their original filename, and to be able to edit the title.

Allow uploaders to edit the titles of docs they upload, for instance at the end of the upload process. We will also want to be able to bulk update some titles with scripts, for instance those with blank titles and HTML pages with long, unhelpful ones.

metasj commented 5 years ago

Titles and filenames are the source of the most feature requests from users of the current system. We look for "Title" field metadata in PDFs and HTML, but often the implicit title is stored in some other way -- the first line of text; the first H1; &c. Many uploads are exports from Word or PowerPoint, which don't always have clean metadata.

joeltg commented 5 years ago

I think (@isTravis ?) that we also want to let users edit/set the publication date of each document. This often isn’t found in metadata and can be lied about anyway

reefdog commented 5 years ago

Titles and filenames are the source of the most feature requests from users of the current system.

(Emphasis mine.) Editing the title makes sense, but by filename, do you mean folks want to be able to edit the filename that appears here: Screen Shot 2019-04-24 at 4 40 15 PM And if so, two questions:

  1. Would you want to store this as a new field, or overwrite the original uploaded filename?
  2. When folks download files from the viewer, they're named after what I suppose is their IPFS key (e.g., zb2rhbMrTKVRJ6GwfDUGExo7RDi6kdzumSio5VPVDyQsBTtyy.pdf). Would we want to ensure the downloaded filename matched whatever the user had set?
joeltg commented 5 years ago

I feel like the filename is a property of an individual file, which should not be editable. Maybe sensitive information will get accidentally included in the filename so maybe we let users hide that info, but IMHO that's not a priority.

Titles and publication dates are properties of the document (which span multiple files/versions) and should be user-editable (we just try to guess at good ones from the metadata).

This is relevant to a recent change in the database model where I moved fileName from the Documents table to the Assertions table.

isTravis commented 5 years ago

I've heard two use cases regarding filenames - and I'm not sure I'm sympathetic to either of them.

  1. Filenames should be used as a form of deduping. By this I think the intention was to use the absolute filename (i.e including whole path). It seems like a trove of unintended functionality and bugs to dedupe based on this - I'd rather only dedupe on hashes.
  2. Some want to be able to correlate a document on PAA with the one in their local system. Storing the filename (i.e. full path) would let them look up the file locally at a later date. This only works for FTP uploads (browser uploads don't give you the full local path) and feels like it could lead to as much confusion as benefit (e.g. local filename changes).

Regardless of whether either of these are processes we want to support, neither require renaming of the filename - so I think we can pass on that ability.

Regarding publication date - yes - we do want to allow them to set that. It is a value scraped from the doc (so can't be verified anyways) and it is important that the uploader be able to assert the publication date accurately. We store a separate uploadDate (well... just createdAt right?) that is not editable.