ualbertalib / jupiter

Jupiter is a University of Alberta Libraries-based initiative to create a sustainable and extensible digital asset management system. This is phase 2 (Digitization).
https://era.library.ualberta.ca/
MIT License
23 stars 10 forks source link

era-beta: item detail (generic and thesis) #508

Closed sfarnel closed 6 years ago

sfarnel commented 6 years ago
cwant commented 6 years ago

Hi Sharon, for the third bullet point about the committee members for thesis, when I query that particular object it has no committee members:

Thesis.find('71afd4bf-1acb-443d-a910-435ae2a0f06e').committee_members
=> nil

The rest of the data for that item appears to be present (not pasted here for privacy concerns).

If you know for sure that there is supposed to be some committee members for that object, perhaps @weiweishi can check the migration script. Other attributes that are empty for that particular object:

supervisors
specialization
unicorn
proquest
unordered_departments

I am particularly surprised about the last one because the ordered JSON department attribute has some data in it. Maybe the code deployed on beta and the data are out of sync? (Or there could just be a bug, or the migrated data is out of sync with the app).

cwant commented 6 years ago

Hi @sfarnel @leahvanderjagt @sfbetz, Regarding the first bullet point, on December 15th in the #era-development channel on slack, I asked whether a different label should be used for License vs Rights. I was told by Leah to just use Licence. Unfortunately, this decision isn't documented elsewhere. :scream_cat:

sfarnel commented 6 years ago

Thanks @cwant Great that it was decided; just wanted to check :)

sfarnel commented 6 years ago

Thanks @ cwant @danydvd and @anayram can you check the item in question and see if the committee members are in the data? the item in current ERA has it but perhaps it didn't make it through?

anayram commented 6 years ago

I checked the January migration package and both the supervisor and committee members are there but do not appear in https://era-beta.library.ualberta.ca/items/71afd4bf-1acb-443d-a910-435ae2a0f06e Have not checked solr, though.

<http://uat.library.ualberta.ca:8080/fcrepo/rest/uat/t4/35/gg/56/t435gg568> <http://terms.library.ualberta.ca/supervisor> "Doe, John (English and Film Studies)" .

and

<http://uat.library.ualberta.ca:8080/fcrepo/rest/uat/t4/35/gg/56/t435gg568> <http://terms.library.ualberta.ca/commiteeMember> "Doe, Jane (English and Film Studies)" .
<http://uat.library.ualberta.ca:8080/fcrepo/rest/uat/t4/35/gg/56/t435gg568> <http://terms.library.ualberta.ca/commiteeMember> "Smith, Sam (English and Film Studies)" .

This is file t435gg568.nt in the theses group and I am looking at the package labelled results(Jan -18) Could this be an issue in the Jupiter migration script? @weiweishi @sfarnel

murny commented 6 years ago

Please be careful of posting real data with real information, especially regarding our users.

This is the second time I had to redact user's information from github issues/comments.

This is a privacy concern! Everything in this repo is open source, meaning anyone can see it. I'm sure some of our students/professors don't want their name/email/etc being made public without their consent.

sfarnel commented 6 years ago

Thanks @murny

The item in question is a thesis openly available on ERA (https://era.library.ualberta.ca/files/t435gg568) with the information exactly as you see it, so nothing was included that is not already publicly available data.

We do try very hard not to add anything to Github that may be sensitive (i.e., username, etc.) or anything that is currently embargoed or private. But do appreciate others checking in to make sure we don't miss something!

But if we have a general guideline for never using actual data (open or not) then we will of course follow it!

anayram commented 6 years ago

Thanks @murny

We only post metadata for items that are publicly available, never private items or anything requiring authentication.

anayram commented 6 years ago

@sfarnel The edited code in the metadata example is no longer useful to the conversation, as it was the specific metadata missing in era beta.

weiweishi commented 6 years ago

Thanks @anayram https://github.com/anayram I will rerun the migration on this particular object and see if there's any issue with the migration script not picking up the triples.

Weiwei ShiDigital Initiative Applications Librarian

2-10L Cameron Library, University of Alberta 780-492-7802 | weiwei.shi@ualberta.ca "The University of Alberta respectfully acknowledges that we are situated on Treaty 6 territory, traditional lands of First Nations and Métis people."

On Thu, Mar 1, 2018 at 4:34 PM, Mariana Paredes-Olea < notifications@github.com> wrote:

@sfarnel https://github.com/sfarnel The edited code in the metadata example is no longer useful to the conversation, as it was the specific metadata missing in era beta.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ualbertalib/jupiter/issues/508#issuecomment-369768958, or mute the thread https://github.com/notifications/unsubscribe-auth/AB8-fuLScfLBwR_Ede1JLB159XOkJ8BWks5taIWdgaJpZM4SNxVF .

murny commented 6 years ago

Shouldn't matter if its public in ERA.

I would very much assume we are collecting data under the Alberta Freedom of Information and Protection of Privacy (FOIP) Act here in ERA (Hard to tell since we don't have a privacy policy at all! But this is a U of A standard, we all took privacy training here at U of A for this reason. Also the library site collects data this way so I am pretty sure ERA does as well).

The data collected in ERA is under the assumption it will be used only for ERA. If I deposit an item in ERA, I sure hope my name/etc isn't plastered on a bunch of 3rd party websites (github/etc) that I didn't consent for (or sold to the highest bidder...).

I'm not a lawyer I could be wrong here, but when it comes to user privacy why even risk it? We should take all precautions to not post real names and email addresses when we can. And even if the names/information are already public in other ways, we should not be making them even more accessible on the world wide web.

Also, the previous incident was email address (which I hope we can all agree, this shouldn't be made public under any circumstances): https://github.com/ualbertalib/jupiter/issues/509#issuecomment-367792509

sfbetz commented 6 years ago

@murny the user agreement explicitly allows us to use submission data (which would include the publicly accessible metadata) and distribute it in any format, and would definitely cover testing / inclusion in GitHub. Our agreement was reviewed by both legal counsel and copyright office. Our privacy policy IS the UofA's privacy policy, but because this is information provided explicitly by users for these explicit purposes I'm confident we are not in breach of either it or any FOIP legislation. @sfarnel @leahvanderjagt are both very well versed in FOIP and related policy and data management concerns and can likely provide very complete and detailed information if you would like to know more. The clause in the agreement that is most relevant is below. As long as we're respecting the access rules the user set, we could theoretically print a submission on shirts and wear them around the office if we wanted.

"By accepting this agreement, you (the author or copyright right owner or designate) grant to the University of Alberta the non-exclusive rights to reproduce, translate (as defined below), and/or distribute your submission (including the abstract) worldwide in print and electronic format and in any medium, including but not limited to audio or video"

murny commented 6 years ago

Think there is two different issues here.

Item metadata is one thing. This is what @sfbetz is talking about with regards to our user agreement. (So by all means, we are protected legally. Ethically however should we be going around and posting people's names and other personal information found in ERA on 3rd party websites not associated with U of A? I hope your not condoning that someone can post every single author/contributor/supervisor name that can be found in ERA here in github or somewhere else? I mean of course legally, you can and are protected. But should you? Regardless why publicize people's names and other information on a 3rd party website not associated with U of A (regardless if you can legally or not)? What value does it give to publicize this information? There is no value added here if you include this information or not...

User data (CCID and other information that comes from CCID login) however I am pretty sure is a different can of worms. I would assume, they do not waive their right for this. And I am pretty sure IST and UofA wants you to protect this information (think we signed a document with IST with regards to privacy for this information even). I would really hate to see everyone of our user's ccid/email posted online (even though every one of our user's ccid/email is....publicly available in ERA currently: https://github.com/ualbertalib/di_internal/issues/2). We been told this information is so private, that we can't even put this information in 3rd party applications that we have full control of and only we can see that data (Rollbar/google analytics/etc). But yet we are posting it on github? This makes little sense. And again, I have no idea why I am seeing such a big pushback here?

But If your saying we can, then sorry to everyone above, that I am wrong! If this is the case, we should probably stop caring so much about privacy issues around google analytics, putting logs and user's information on 3rd party services and other things. And probably should start pushing back on higher ups regarding this.

weiweishi commented 6 years ago

I agree with @murny that we are dealing with two issues here. This particular discussion was triggered by sharing metadata on an object, which is publicly available in ERA, and also directly related to the problem we are trying to solve, and is less concerning to us. But it also touched on an earlier issue of disclosure of user information on a public site, which I agree with Shane that it poses high privacy concern.

I think someone mentioned in the retrospective meeting that the team should work towards an internal standard on how the team should use GitHub, zenhub and other tools for creating/tagging/closing issues, and this is an area we should also include in that work. Let's work together to establish team guideline and best practices regarding whether different kinds of information should be shared and how to share them on these third-party tools. I feel all of these discussions above really provide a solid start towards that work.

Weiwei ShiDigital Initiative Applications Librarian

2-10L Cameron Library, University of Alberta 780-492-7802 | weiwei.shi@ualberta.ca "The University of Alberta respectfully acknowledges that we are situated on Treaty 6 territory, traditional lands of First Nations and Métis people."

On Thu, Mar 1, 2018 at 7:18 PM, Shane Murnaghan notifications@github.com wrote:

Think there is two different issues here.

Item metadata is one thing. This is what @sfbetz https://github.com/sfbetz is talking about with regards to our user agreement. (So by all means, we are protected legally. Ethically however should we be going around and posting people's names and other personal information found in ERA on 3rd party websites not associated with U of A? I hope your not condoning that someone can post every single author/contributor/supervisor name that can be find in ERA here in github? I mean of course legally, you can and are protected. But should you? Regardless I still don't see the pushback here. Why publicize people's names and other information on a 3rd party website not associated with U of A (regardless if you can legally or not)? What value does it give to publicize this information? There is no value added here if you include this information or not...

User data (CCID and other information that comes from CCID login) however I am pretty sure is a different can of worms. I would assume, they do not waive their right for this. And I am pretty sure IST and UofA wants you to protect this information (think we signed a document with IST with regards to privacy for this information even). I would really hate to see everyone of our user's ccid/email posted online (even though every one of our user's ccid/email is....publicly available in ERA currently: ualbertalib/di_internal#2 https://github.com/ualbertalib/di_internal/issues/2). We been told this information is so private, that we can't even put this information in 3rd party applications that we have full control of and only we can see that data (Rollbar/google analytics/etc). But yet we are posting it on github here? This makes little sense.

But If your saying we can, then sorry to everyone above, that I am wrong! If this is the case, we should probably stop caring so much about privacy issues around google analytics, putting logs and user's information on 3rd party services and other things. And probably should start pushing back on higher ups regarding this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ualbertalib/jupiter/issues/508#issuecomment-369798203, or mute the thread https://github.com/notifications/unsubscribe-auth/AB8-fpcnPPlH92LLLpun_X9VdDauBOvJks5taKvdgaJpZM4SNxVF .

anayram commented 6 years ago

I have absolutely no problem following any decisions regarding the citation of metadata.

The citation of user information is totally out of line from my perspective, but the citation of bibliographic information is a different matter. If we can't cite the latter, then a lot of people are in trouble. Having said that, I am completely open to following any decisions following this conversation.

I have to say that it is probably the way things were brought up that creates a tense environment for discussion. I am sure this is an important issue, and I am confident we can find ways to bring up problems with collegial collaboration.

murny commented 6 years ago

I completely agree @anayram. I would like to apologise to @sfarnel @anayram @sfbetz and everyone else that might have been involved with this.

This could have been handled so much much better on my part. Regardless if I am right or wrong (obviously I am wrong, sorry!), there’s no excuse that I can possibly make that could justify my behaviour and my tone of my messages. My intention wasn't to be condescending or patronising, but I can see how this totally came off. And I am sorry for this!

Going forwards, I’m going to ensure that if I have a problem in the future, I will not openly air it on GitHub (obviously never a good place to do this) and will seek to talk about it in person with someone first, before consulting others. I promise that this won’t ever happen again, and I’m taking steps to ensure that this isn’t a repeat occurrence. Hopefully you can accept my apology and again sorry for causing such a headache for everyone involved. Thanks!

sfarnel commented 6 years ago

Thanks very much @murny

We're all colleagues working together doing good things, and I agree with you and @anayram that we're continuing to find effective and collegial ways of communicating!

anayram commented 6 years ago

Thank you, @murny, I think we have a great team, and I can only see good things coming from all these conversations when we use active listening. I am sure more things will come up!

I think @sfbetz' suggestion to consult with @sfarnel and @leahvanderjagt about FOIP and data management concerns is the way to go, as well as @weiweishi's call to establish team guidelines and best practices for sharing information.

murny commented 6 years ago

Closing issue, all of these have been resolved now:

Chris mentioned above regarding Leah decision of wanting to just use License for both. So if this decision changes, we can create a new issue to fix this

Date created field from the code, is being displayed with the value out of creation date instead of sort year

From the code we are printing out the committee members if they have them. If they are not showing, then the thesis doesn't have them. One example of such a thesis is here: https://era.library.ualberta.ca/items/57407e00-d387-4b9e-8b0b-428cc0266007

We no longer show the non active tabs total numbers, so this is fixed.