ucsdlib / damsmanager

DAMS Manager
Other
3 stars 1 forks source link

Global data cleanup to change the existing old Use Constraint(s) (dams:copyrightNote) to the new CC copyright Use Constraint(s) template for objects with a cc license #386

Closed jessicahilt closed 4 years ago

jessicahilt commented 4 years ago

Descriptive summary

From Ho Jung: Referencing https://github.com/ucsdlib/damspas/issues/642, following is the latest spec for applying the copyright boilerplate to Research Data Collections. If I'm not mistaken, the Rights Holder can be:

1) an entity (UC Regents, individual, organization), 2) not copyrighted - Public Domain or CC0, 3) unknown.

I don't think RDC will ever use option 3, so the text below should apply to option 1, and the current one we use for PD and CC0 can be used for option 2. I think an example of the public domain text is here: https://library.ucsd.edu/dc/object/bb3823744m. Works for me.

Related work

Batch Export uses old Creative Commons license boilerplate text #367

lsitu commented 4 years ago

@hjsyoo I am getting a little confusing while referencing ticket https://github.com/ucsdlib/damspas/issues/642. Currently we have the following 6 boilerplate for Constraint(s) on Use:

  1. Public Domain:

    Constraint(s) on Use: This work may be used without prior permission.
  2. Copyrighted:

    Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that  allowed by "fair use" requires written permission of the copyright holder(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.
  3. Copyrighted UC Regents:

    Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" requires written permission of the UC Regents. Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.
  4. Copyrighted Other:

    Constraint(s) on Use: This work is protected by the copyright law. Use of this work beyond that allowed by the applicable copyright statute requires written permission of the copyright holder(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.
  5. Unknown US:

    Constraint(s) on Use: This work may be protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" requires the written permission of the copyright holders(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Libraries. Inquiries can be made to the UC San Diego Libraries department having custody of the work.
  6. Unknown Other:

    Constraint(s) on Use: This work may be protected by the copyright law. Use of this work beyond that allowed by the applicable copyright statute requires the written permission of the copyright holders(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Libraries. Inquiries can be made to the UC San Diego Libraries department having custody of the work.

After apply the new specs in Slack https://ucsdlibrary.slack.com/archives/C3RENCQFN/p1579802424006300, we'll only have two boilerplate for Constraint(s) on Use:

  1. Public Domain OR CC0:

    Constraint(s) on Use: This work may be used without prior permission.
  2. Copyrighted/Unknown

    Constraint(s) on Use: This work is protected by U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" or any license applied to this work requires written permission of the copyright holder(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.

Is it correct? Do you have an example for copyright status CC0? Thanks.

hjsyoo commented 4 years ago

@lsitu Thank you for this list of existing boilerplates, it'll make it easier for me to spec precisely. I'll consult with DOMM and others and get back to you.

hjsyoo commented 4 years ago

@lsitu Here's a new spec - please disregard the previous ones. Here, I'm trying to use a combination of the boilerplate headings you gave me and the logic that I think should be applied to them. Let me know if you need any clarification.

  1. Public Domain: IF copyright=public domain OR license=CC0 THEN
  1. Unknown US: ELSE IF copyright = unknown AND IF jurisdiction = US THEN
  1. Unknown Other: ELSE IF copyright = unknown AND IF jurisdiction != US THEN
  1. Copyrighted US: ELSE IF copyright = known (i.e. copyright is not “public domain” or “unknown” and there is not a CC0 license) AND IF jurisdiction = US THEN
  1. Copyrighted Other: ELSE
lsitu commented 4 years ago

@hjsyoo Thanks. Could you give me an example with license=CC0 in # 1 above for public Domain?

hjsyoo commented 4 years ago

I don't think we've actually used CC0 yet, but here's a screenshot from @abbypenn93 showing the settings that DOMM uses to designate CC0: image

lsitu commented 4 years ago

@hjsyoo Thanks. So to determine an object with license=CC0, it don't need to have CC license, but just need to be copyrighted with a public display license and a note like "Public access granted by rights holder." as created from the above form options. Is it correct?

lsitu commented 4 years ago

@hjsyoo I think we need a rule to determine those objects with license=CC0. We should have lots of license=CC0 objects in dams if objects created by the form above (copyrighted objects with a display license) are license=CC0. For example, objects in Ben Yellen Papers collection are license=CC0 objects except https://library.ucsd.edu/dc/object/bb17414735 (Copyrighted Other). With that said, objects in Ben Yellen Papers collection (except bb17414735) need to be updated to use the dams:copyrightNote boilerplate for public domain:

Constraint(s) on Use: This work may be used without prior permission.

Does it sound correct?

hjsyoo commented 4 years ago

@lsitu Thanks for catching the issue. I think there are two separate parts to this -

  1. How is CC0 designated in the dams data model and import tool?
    • I think you're right about getting clarity on the rule! The few Ben Yellen objects I've browsed so far indicate Rights Holder = "Private party", and the copyright boilerplate is not what I would expect for something where copyright has been waived. So, I'm guessing that the Ben Yellen collection isn't meant to be CC0, which means that I'm wrong about how CC0 gets designated in the DAMS. But DOMM is the authority on this. @abbypenn93 @arwenhutt @remerjohnson, do you know how CC0 gets designated in the import tool? Properly done, there should be no Rights Holder, but Access Override=Public access granted by rights holder does seem appropriate for a CC0 waiver.
  2. What is the correct boilerplate for CC0?
    • I'm pretty certain that the Constraint(s) on Use boilerplate for CC0 should be the same as for Public Domain, but let me triple check since I don't want to get this wrong. I'll confirm in this ticket.
hjsyoo commented 4 years ago

@lsitu I've conferred with DOMM, and we've confirmed that there is currently no proper method for designating the CC0 dedication. This means that curators will figure out workarounds on a case-by-case basis, as they come up. The good news is, this simplifies the work we need to do for this ticket! I've removed any mention of CC0 on this latest spec:

  1. Public Domain: IF copyright=public domain THEN

Constraint(s) on Use = This work may be used without prior permission.

  1. Unknown US: ELSE IF copyright = unknown AND IF jurisdiction = US THEN

Constraint(s) on Use: This work may be protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" or any license applied to this work requires written permission of the copyright holders(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.

  1. Unknown Other: ELSE IF copyright = unknown AND IF jurisdiction != US THEN

Constraint(s) on Use: This work may be protected by copyright law. Use of this work beyond that allowed by the applicable copyright statute or any license applied to this work requires written permission of the copyright holders(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.

  1. Copyrighted US: ELSE IF copyright = known (i.e. copyright is not “public domain” or “unknown” and there is not a CC0 license) AND IF jurisdiction = US THEN

Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" or any license applied to this work requires written permission of the copyright holder(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.

  1. Copyrighted Other: ELSE

Constraint(s) on Use: This work is protected by copyright law. Use of this work beyond that allowed by the applicable copyright statute or any license applied to this work requires written permission of the copyright holder(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.

lsitu commented 4 years ago

@hjsyoo Thanks. I'll move forward without the case for license=CC0 now.

lsitu commented 4 years ago

@hjsyoo I found lots of objects don't have jurisdiction = US but I believe they should be Copyrighted US. For example, dams:copyrighNote with text like Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" requires written permission of the UC Regents.: https://library.ucsd.edu/dc/object/bb03081961 (Under copyright) https://library.ucsd.edu/dc/object/bb62810789 (Under copyright) https://library.ucsd.edu/dc/object/bb0035175d (Under copyright) https://library.ucsd.edu/dc/object/bb97966013 (Under copyright) https://library.ucsd.edu/dc/object/bb64859225 (Under copyright)

dams:copyrighNote with text like Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" requires written permission of the copyright holder(s).: https://library.ucsd.edu/dc/object/bb0035124g (Under copyright) https://library.ucsd.edu/dc/object/bb17414735 (Under copyright) https://library.ucsd.edu/dc/object/bb0956458m (Unknown) https://library.ucsd.edu/dc/object/bb9898738s (Unknown)

Should we match the existing text in the dams:copyrighNote to apply the new boilerplate if no jurisdiction = US?

arwenhutt commented 4 years ago

@lsitu @hjsyoo I think it's probably safe to treat no jurisdiction the same as jurisdiction = US.

@lsitu if you have a list of the arks that don't have jurisdiction, could you share that? I don't think it's a high priority clean up but something we can do when there's bandwidth.

lsitu commented 4 years ago

@arwenhutt I don't have the complete list yet. I think we can generate a report for it later. But we may need a decision for no jurisdiction mapping now to move forward with the following tickets, which need to be done at the same time: https://github.com/ucsdlib/damsmanager/issues/386 https://github.com/ucsdlib/damsmanager/issues/387 https://github.com/ucsdlib/damsmanager/issues/388

If you think we need to generate a report for no jurisdiction to make the decision, then we can stop moving forward with the tickets above and generate the report for you first. Thanks.

arwenhutt commented 4 years ago

@lsitu nope, I don't think we need the report to make a decision. I'm fine with treating no jurisdiction the same as jurisdiction = US.

I just thought if you had a report, I'd file it away for future clean up -- but don't worry about creating it now. I'll just add a note to our clean up list, and we can request the report later.

hjsyoo commented 4 years ago

@lsitu All of those examples you provided are SC&A. We should apply this ticket request only to RDC collections. Will this complicate things excessively? But if there are RDC objects currently with no jurisdiction, then yes, they should be assigned US. I can't think of any non-US objects we've ingested, off the top of my head.

lsitu commented 4 years ago

@hjsyoo Okay, then we'll only update the dams:copyrightNote fro objects in RDC in this ticket. @arwenhutt Yes, we can generate the report for those no jurisdiction at any time. And I'll just move forward to clean it up for RDC collections now. For Excel Standard InputStream, do you want to apply the copyright note boilerplate above for all new ingest now?

lsitu commented 4 years ago

@hjsyoo Yes, objects in RDC collection is much cleaner. I found total 9836 objects in RDC with 104 public objects. Does the total count seems correct?

I only see three of them may have the copyright issue: https://library.ucsd.edu/dc/object/bb2052790m (missing copyright) https://library.ucsd.edu/ark:/20775/bb44418908 (duplicate copyright element) https://library.ucsd.edu/ark:/20775/bb5534097t (duplicate copyright element)

Will you correct the copyright for these three objects? I think we should do the cleanup after the new version of damspas and damsmanager are released. Does it sounds good?

arwenhutt commented 4 years ago

@lsitu @hjsyoo I'll edit the last two to remove the duplicate copyright elements. The first is a test object, I can add something there if needed but I don't think that item will ever be public.

arwenhutt commented 4 years ago

@lsitu ✅ done

lsitu commented 4 years ago

Thanks. @arwenhutt Regarding damsmaanger update in ticket https://github.com/ucsdlib/damsmanager/issues/387, would you like to apply the copyright note boilerplate above for all newly ingest objects?

arwenhutt commented 4 years ago

@lsitu yes, I believe so.

hjsyoo commented 4 years ago

Thanks @lsitu and @arwenhutt . Yes, 104 public objects is correct, and the total is very close to expected.

lsitu commented 4 years ago

@hjsyoo Maybe @arwenhutt need to fix those 872 records that are missing dams:Unit in https://github.com/ucsdlib/damsmanager/issues/389 before we can do the cleanup?

arwenhutt commented 4 years ago

@lsitu I don't think that should be a problem. I've analyzed the records, there are a bunch that are "empty" so are fine not having a unit, the rest I have ready to go, we just wanted to do a a few small overlay tests for https://github.com/ucsdlib/damspas/issues/732 before making the bulk of the updates.

arwenhutt commented 4 years ago

@lsitu also, have I mentioned how much I love the batch export and batch overlay tools? they are truly (work) life changing! thank you : )

lsitu commented 4 years ago

@arwenhutt Thanks for your update on https://github.com/ucsdlib/damspas/issues/732. Glad you like the batch export and batch overlay tools! @hjsyoo Since the damspas code still building the copyright note boilerplate on prod at this time, I think we have to wait until the new releases for damspas and damsmanager are deployed before we can do the cleanup. Mean while, would you like me to do the cleanup on staging instead so that you can test it?

hjsyoo commented 4 years ago

@lsitu Happy to test the code on staging first!

lsitu commented 4 years ago

@hjsyoo I've started the process on staging for the copyright boilerplate cleanup. I'll let you know once it's ready for you to review.

arwenhutt commented 4 years ago

I created a ticket for the <dams:unit> clean up work mentioned as a dependency above

hjsyoo commented 4 years ago

@lsitu The code fix described in https://github.com/ucsdlib/damsmanager/issues/386#issuecomment-579494388 works exactly as expected in staging. Thanks!

arwenhutt commented 4 years ago

@lsitu In reviewing a small sample of objects, 2 of the 7 objects I reviewed were not updated.

Here's my tracking spreadsheet for the review: damsmanager_2_77_0_issue_386.xlsx

lsitu commented 4 years ago

@arwenhutt Got it. I think this should be the SOLR update issue on staging, which may not be reliable enough with other deployment and tests on-going at the same time. Would like to run SOLR re-indexing for all rdcp object on staging for you to review?

arwenhutt commented 4 years ago

@lsitu If you think it might help, but I didn't think SOLR indexing affected the rdf data views?

lsitu commented 4 years ago

@arwenhutt Yes, solr indexing won't affected the rdf data views but just update those out-dated metadata in SOLR for damspas UI display.

arwenhutt commented 4 years ago

@lsitu okay, so no need to reindex for this ticket then. I did my review on the rdf data (at the location listed in column E of the spreadsheet attached above) not the UI display.

lsitu commented 4 years ago

@arwenhutt It sounds good. At the mean time, I am starting the SOLR re-indexing process for rdcp records now so that you are still able to take a look later if needed. But it will take a little while.

arwenhutt commented 4 years ago

Closing per followup in #399

lsitu commented 4 years ago

@arwenhutt Just let you know that we haven't done any work to clean up the copyright boilerplate on prod yet.

arwenhutt commented 4 years ago

@lsitu doh! I totally knew that, but was in the verify-close-celebrate mode that I spaced on needing to leave this open! Thanks : )

arwenhutt commented 4 years ago

@lsitu and to verify, before implementing this I think we need to fix the objects missing <dams:unit> and the empty <dams:copyright> properties

Is that correct?

lsitu commented 4 years ago

@arwenhutt Yes. But I think it doesn't matter if you can fix the dam:copyrightNote in the spreadsheet for batch overlay.

lsitu commented 4 years ago

@arwenhutt For your comment https://github.com/ucsdlib/damsmanager/issues/386#issuecomment-590993259, what's your plan then?

arwenhutt commented 4 years ago

@lsitu Once the updates to damsmanager are deployed to prod I can update the objects missing dams:unit. I don’t know the scope of the blank copyright fields, or if they can be fixed using batch overlay. Could you identify and remove those?

Then once those are both done, you can run your scripts for the update.

How does that sound?

lsitu commented 4 years ago

@arwenhutt Yes, it sounds good. I think my script can fix those records in RDCP that have empty copyright fields. But for those objects in DLP collection, they can be fixed using batch overlay but we may need a report for them.

lsitu commented 4 years ago

@arwenhutt I've finished the global copyright boilerplate cleanup for RDCP collections on prod. And I think you can review it now. Here are the objects affected:

And 50 objects found to be missing copyright elements: ark_error.txt

arwenhutt commented 4 years ago

@lsitu I've reviewed a sample of the updated files and they look good. There are some items that didn't get updated (because they didn't meet the search parameters I'm guessing) but we'll update them using batch overlay. So I think this can be closed! :tada:

lsitu commented 4 years ago

@arwenhutt Yes. all 50 objects in file ark_error.txt of my comment above (https://github.com/ucsdlib/damsmanager/issues/386#issuecomment-594179492) haven't been updated since there are no copyright element and the I don't know how to make the change without the criteria in the copyright element.

arwenhutt commented 4 years ago

@lsitu doh! I didn't see that one! thanks for the great reports - helps a lot with qa :)