ualbertalib / jupiter

Jupiter is a University of Alberta Libraries-based initiative to create a sustainable and extensible digital asset management system. This is phase 2 (Digitization).
https://era.library.ualberta.ca/
MIT License
23 stars 10 forks source link

License statement for theses in metadata from some point after 2016 is the wrong statement #1591

Open leahvanderjagt opened 4 years ago

leahvanderjagt commented 4 years ago

Describe the bug On March 1, 2016, the following statement was to be implemented as the license for ALL THESES.

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

We've discovered that for theses following that period, we are seeing this old, creaky and long-discarded license in the metadata:

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

To Reproduce Steps to reproduce the behavior:

  1. Go to the Theses and dissertations collection https://era.library.ualberta.ca/communities/db9a4e71-f809-4385-a274-048f28eb6814/collections/f42f3da6-00c3-4581-b785-63725c33c7ce
  2. Restrict facet to 2016, jump around and you'll see the right license
  3. Restrict facet to 2019, jump around you you'll see the wrong old license
  4. Sonya's head explodes b/c it was a lot of work to revise that license while I was on blissful mat leave.

Expected behavior We need the new license applied to every record from March 1, 2016 (i.e. June convocation, all records with graduation date value Spring 2019). BUT FIRST I need to know a) how extensive is this problem? How many theses are displaying this license? b) Is there a pattern to this? Do you notice this starting at a certain point?

As ERA's Product Owner I need to know this because I will need to go to FGSR and IST and make sure they aren't having students agree to the wrong license in the Alfresco submission system IST runs. I need to know if this is an "us" problem or a "them" problem before we amend anything.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional

The root could be a few things: a) FGSR gave a bad statement to IST which put that in their thesis submission system. b) We are autopopulating that field somehow on batch ingest with an old statement.

I am aware I am asking for a data query. Please do not fix the problem until I understand its extent. --Leah

leahvanderjagt commented 4 years ago

Assigned @seanluyk for his information

seanluyk commented 4 years ago

@anayram do we validate license text for thesis ingest?

anayram commented 4 years ago

Thank you @leahvanderjagt !

Because the very last theses batch I worked on was on November 2016 I will answer these questions informed by our current data and not memory, so I hope to offer help instead of muddying the waters. Quick answers to your questions:

a) how extensive is this problem? How many theses are displaying this license?

Based on this report (2020-24-02 snapshot data), looks like there are approximately 1956 cases containing the exact old license text you cited.

b) Is there a pattern to this? Do you notice this starting at a certain point?

Yes, based on graduation dates, ingest dates, and dates of records created in Jupiter, the license was added to theses ingested in four specific batches for the dates listed below. I don't have the batch ids right now but can get this info from solr. Let me know if you need this.

2018-06 2018-11 2019-06 2019-11

@leahvanderjagt @seanluyk @sfarnel can I access spreadsheets that come directly from FGSR so that I can trace back the origin of this arcane license?

@seanluyk as far as I know the only validation known for licenses is probably for CC licenses (uris) and not for custom license texts. Since this issue is specific to theses, it might be a good idea to validate them upon ingest.

sfarnel commented 4 years ago

Thanks very much @anayram I have only seen the manifests from IST when we have had an issue, but based on the ones I have seen and memory from when this process was developed, the license information does not come in the manifest, but gets added on ingest. But I may be mistaken. @weiweishi can confirm

leahvanderjagt commented 4 years ago

We can simply proceed with a batch edit if we are the source of the incorrect license. This would be a higher than average priority data correction request.

On Wednesday, April 15, 2020, Sharon Farnel notifications@github.com wrote:

Thanks very much @anayram https://github.com/anayram I have only seen the manifests from IST when we have had an issue, but based on the ones I have seen and memory from when this process was developed, the license information does not come in the manifest, but gets added on ingest. But I may be mistaken. @weiweishi https://github.com/weiweishi can confirm

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ualbertalib/jupiter/issues/1591#issuecomment-614055991, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMJ4XBPYIXPRAFODTBH6TLRMW4KFANCNFSM4MHNCKXQ .

--


Leah Vanderjagt Head, Digital Repository and Data Services, University of Alberta Library University of Alberta t. 780.492.3851 / leahv@ualberta.ca she/her/hers

The University of Alberta respectfully acknowledges that we are situated on Treaty 6 territory, traditional lands of First Nations and Métis people.

mbarnett commented 4 years ago

I just glanced at the thesis batch import script, and to maybe partially answer Sharon's question for Weiwei, we don't appear to hardcode any license information for the theses in the script – it looks like it gets loaded out of a CSV file. Whether or not we're the source of the license information in the CSV file, I wouldn't know.

(side note: Weiwei has turned off a lot of her GitHub notifications to avoid getting tagged into older conversations, so if you need to get in touch with her, emailing her directly is probably recommended)

seanluyk commented 4 years ago

@sfarnel @anayram @mbarnett I'm wondering if we might want to take another look at the batch import script for theses at a future backlog meeting to determine if it still meets our needs? I agree with @leahvanderjagt though that in the shorter term, we may want to fix this while we investigate the source of the problem/scope out potential improvements

mbarnett commented 4 years ago

We can certainly look at it down the road – batch import of theses up until this point has been something that Weiwei essentially just cobbled together to support the info dumps she was getting from FGSR, since she understood both the technical details and the larger library contextual details well enough to make it happen without pulling any technical resources away from other priorities. I don't think there's any commitment to keeping anything the way it's currently being done on our part.

Really I think our best bet will be to take a step back and look at batch import in general, not just in the context of theses & ERA-as-IR but in the broader Jupiter context of having to support bringing in a lot of batches of data from Peel and many other sources. There's a lot of overlap and potential to standardize and improve (and, frankly, create) the infrastructure for batch processes, which are largely non-existent outside of "there's a script somewhere" at the moment.

anayram commented 4 years ago

Thank you, all @seanluyk @sfarnel @mbarnett Yes, I think it would be a very good moment to think of batch import process in general, including metadata review for sensitive content like rights, subjects, and other fields on high demand.

sfarnel commented 4 years ago

Thanks @mbarnett and all. I agree that this would be a very good time to look at batch ingest holistically across services and from front to back. Looking forward to the discussion. (re: theses, I wonder if there is another csv that includes just the license info?)

mbarnett commented 4 years ago

It looks to me like it all comes from one CSV, but I've pinged Weiwei and asked her to take a look at this thread when she has a moment. I don't have any samples of the data that's used in the process, unfortunately.

sfarnel commented 4 years ago

Thanks @mbarnett Found a sample of what we get from IST from fall 2019; will send via email

weiweishi commented 4 years ago

Sorry for chiming in this conversation late. The license language is in the manifest sent by IST. I shared the latest file with Matt as an example. Suresh at IST can help the CSV file generated by Alfresco for future thesis manifests, although I felt this issue has been raised a few times with them already, for some reason, it crept back to us.

Weiwei ShiAssociate University Librarian

2-10L Cameron Library, University of Alberta 780-492-7802 | weiwei.shi@ualberta.ca "The University of Alberta respectfully acknowledges that we are situated on Treaty 6 territory, traditional lands of First Nations and Métis people."

On Wed, Apr 15, 2020 at 12:32 PM Sharon Farnel notifications@github.com wrote:

Thanks @mbarnett https://github.com/mbarnett Found a sample of what we get from IST from fall 2019; will send via email

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ualbertalib/jupiter/issues/1591#issuecomment-614206033, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPT47QK2JWD26FBIE3254LRMX4SVANCNFSM4MHNCKXQ .

seanluyk commented 4 years ago

An interesting aside to share. In the Thesis Deposit Submission Instructions, p.13, the license shown is old license. I'm willing to bet that Alfresco is populating the wrong one, which is also concerning as students would be agreeing to that license in the workflow, not the correct one

leahvanderjagt commented 3 years ago

I contacted the Alfresco team some time ago to instruct them to change the license to the correct one in their application and they confirmed that they have done so. FYI. Batch edit is still outstanding afaik.

leahvanderjagt commented 3 years ago

IMPORTANT please note additionally:

Any thesis completed prior to Fall 2014 convocation (2014-11 is the likely date value) should have the following license text:

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Any thesis completed starting in Fall 2014 convocation going forward is (as expressed initially in this thread), should have the following license text:

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

@anayram @piyapongch @sfarnel @kgood we will need to:

a) Important for current batch work ensure that the correct statements are attached to the corresponding theses for the batches currently being prepared according to those reference dates.

b) (to be scheduled) batch edit theses already in ERA to ensure the correct statements are attached to the corresponding theses according to those reference dates.

anayram commented 3 years ago

@leahvanderjagt sounds good, I will make sure that the batches for legacy theses contain the license below (all with graduation dates before 2014).

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for the purpose of private, scholarly or scientific research. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

When the time is good I could update the list of theses to be updated with the new license too.

leahvanderjagt commented 3 years ago

@anayram Awesome, thank you!