Add to DAMS Manager to trigger watermarking for VRR pdfs/images

gamontoya commented 5 years ago

Descriptive summary

Add a checkbox to DAMS Manager Excel import to trigger watermarking of files on ingest

 [ ] generate watermarked document and image service files

fileUse rules

When the checkbox is selected:

For files ingested with File use = document-source
1. Store original pdf with document-source file use
2. Generate watermarked pdf with document-service file use
For files ingested with File use = image-source
1. Store original with image-source file use
2. Generate all derivatives with watermark and standard file use values (image-service, image-huge, etc.)

For discussion on fileUse, etc. see: ucsdlib/damspas/issues/622

Related work

ucsdlib/damspas#654
ucsdlib/damsmanager#331
ucsdlib/damsmanager#320
ucsdlib/damspas#622

arwenhutt commented 5 years ago

@mdpeters @gamontoya I've added what we discussed re: file use values above - please review!

Also two questions:

we talked about image-service, but there are a number of other image derivatives created on ingest - do we want to all of the image-derivatives to be watermarked? (I'm guessing yes, but wanted to verify)
it would be good to have some flag, hook for the files which are watermarked - it's the kind of thing that we can infer in the current scenario, but that as time passes and processes change, that will get more difficult and less accurate. Thoughts on easy way to do this? There may already be something in the files themselves (that will or could be pulled out by fits), or the event log for the files...

mdpeters commented 5 years ago

@arwenhutt For files ingested with image-source do we really need to change that to document-source? Though that might be necessary if we need to generate a large watermarked image for zoom tile generation; I'm not entirely sure what the workflow is for generating the tiles, is that done off of the master tiff (I assume this is true) or off of a later smaller derivative? Whichever we'd likely need to generate a larger watermarked deriv for zoom to work off of as well.

I'm not sure if the smaller image types need the watermark as it'd be illegible at that size but if they are generated off of one of the smaller images generated earlier in the chain, it's fine.

Were you thinking new file use cases or another bit of data, new use cases would likely mean a bit more work on interface since it'd have to look for those file types for display. I notice there is a label attribute in the object json for files that doesn't seem to get used, put something like "watermarked" there (unless that label area is earmarked for something else)?

arwenhutt commented 5 years ago

@mpeters

For files ingested with image-source do we really need to change that to document-source? ack! no, that was just a typo! sorry - corrected above!

(I'm ooo today and need to go make breakfast - so will reply to the other parts when I'm back in the office, just wanted to correct my typo asap : )

mpeters commented 5 years ago

@arwenhutt You mentioned the wrong person :) I think you meant @mdpeters

gamontoya commented 5 years ago

@lsitu Do you have any thoughts on what Arwen said above:

it would be good to have some flag, hook for the files which are watermarked - it's the kind of thing that we can infer in the current scenario, but that as time passes and processes change, that will get more difficult and less accurate. Thoughts on easy way to do this? There may already be something in the files themselves (that will or could be pulled out by fits), or the event log for the files...

lsitu commented 5 years ago

@gamontoya / @arwenhutt Maybe we need to add another predicate into the dams:File model to flag it as a watermarked derivative?

gamontoya commented 5 years ago

@lsitu What's the level of effort?

lsitu commented 5 years ago

@gamontoya It should be a simple thing but we need to update both damsmanager and damsrepo for it.

lsitu commented 5 years ago

@gamontoya / @arwenhutt Do we need to add a property to dams:File to flag a watermarked image? What predicate we should use in this case? Thanks.

gamontoya commented 5 years ago

@lsitu Arwen said she doesn't think it's necessary to flag a watermarked image. What do you think?

lsitu commented 5 years ago

@gamontoya I am fine with it. But what's the solution for your comment on https://github.com/ucsdlib/damsmanager/issues/332#issuecomment-502257661?

gamontoya commented 5 years ago

@lsitu My comment had to do with ways to flag a watermarked image and Arwen seems to think now that it may not be necessary if we don't have use cases.

arwenhutt commented 5 years ago

@lsitu @gamontoya yeah, without use cases or others thinking it's important, there doesn't seem to be a strong case for it.

lsitu commented 5 years ago

@gamontoya / @arwenhutt Should I just move forward without flagging those watermarked images/PDFs for now?

gamontoya commented 5 years ago

@lsitu Yes, thank you.

lsitu commented 5 years ago

@mcritchlow I've added PR https://github.com/ucsdlib/damsmanager/pull/337 to wrap up the support for PDF and Image watermarking with Excel InputStream ingest. It's ready for review now. Thanks.

ucsdlib / damsmanager