Closed weiweishi closed 8 years ago
likely related perhaps? but ... if I go to the ERA homepage and click on 'search ERA' (to search all), when I get my list I limit to item type Thesis and then to year 1987. All good to there. But when I get the listing of 1987 theses and click on the title of the first in the list the browser spins and spins until I get a Bad Gateway Time Out error. Happened twice.
Page that spins: https://hydranorth.library.ualberta.ca/catalog?_=1445097698967&f[resource_type_sim][]=Thesis&f[year_created_sim][]=1987&q=
Item I was trying to access: https://hydranorth.library.ualberta.ca/files/wd375z183
I think we can't launch with a theses collection that won't load. Can we complete the process of eliminating reliance on member_ids stored in the collection? If we finish updating all items to have links to their parent collections, and we adapt the solr query we fixed this week to depend entirely on faceting on those lnks, is there any case we can't handle?
I'm worried about upstream functions in hydra-collections/Sufia, such as add items to collection etc. We need to look through their codebase to make sure there's no other dependencies And I can't even access this collection right now. even if we want to remove the member_ids array. not sure what would be the best approach right now.
Weiwei Shi
Digital Initiative Applications Librarian University of Alberta Libraries 2-10L Cameron Library Edmonton, Alberta, Canada T6G 2J8 Phone:(780)492-7802 Fax: (780)248-1209 Email: weiwei.shi@ualberta.ca
On Sat, Oct 17, 2015 at 10:36 AM, Peter Binkley notifications@github.com wrote:
I think we can't launch with a theses collection that won't load. Can we complete the process of eliminating reliance on member_ids stored in the collection? If we finish updating all items to have links to their parent collections, and we adapt the solr query we fixed this week to depend entirely on faceting on those lnks, is there any case we can't handle?
- items link to their parent collection using ...
- items link to their community using ...
- collections discover their members by faceting on ...
— Reply to this email directly or view it on GitHub https://github.com/ualbertalib/HydraNorth/issues/785#issuecomment-148931013 .
Yes, I'm seeing the dependency problem. Perhaps we could launch with a work-around:
How does that sound?
Or, simpler, just remove the membership of all theses from the community and collection, and add text to the Community/Collection description: "To see all the theses click here", with a link to the facet search. (And develop better wording). This wouldn't require a new release.
No, wait, a release would still be needed, since collection descriptions don't currently render html.
Posted issue to develop this approach #787
That seems a reasonable way to handle this. Theses are certainly a popular collection.
Sounds reasonable to me, Peter.
Geoffrey Harder Associate University Librarian University of Alberta
Sent from mobile device
On Oct 17, 2015, at 12:44 PM, Sharon Farnel notifications@github.com wrote:
That seems a reasonable way to handle this. Theses are certainly a popular collection.
— Reply to this email directly or view it on GitHub https://github.com/ualbertalib/HydraNorth/issues/785#issuecomment-148942342 .
If we are able to remove the membership from the member_ids array in this Theses collection, we may no long have the timed out issue, as I think the cause of the timed out is because the large member_ids within the object. And then we can continue to use the existing collection/community page as the presentation of the members on the collection page doesn't depend on member_ids, but depends on solr query.
If that works, we don't need a release. But any of my attempt to access this collection from the backend has timed out. So we may need to delete this collection, and create a new one, and run the rake task again to update the collection info (hasCollectionId) on all member objects, but not adding them directly to the collection.
Weiwei Shi
Digital Initiative Applications Librarian University of Alberta Libraries 2-10L Cameron Library Edmonton, Alberta, Canada T6G 2J8 Phone:(780)492-7802 Fax: (780)248-1209 Email: weiwei.shi@ualberta.ca
On Sat, Oct 17, 2015 at 11:21 AM, Peter Binkley notifications@github.com wrote:
Or, simpler, just remove the membership of all theses from the community and collection, and add text to the Community/Collection description: "To see all the theses click here", with a link to the facet search. (And develop better wording). This wouldn't require a new release.
— Reply to this email directly or view it on GitHub https://github.com/ualbertalib/HydraNorth/issues/785#issuecomment-148935624 .
And we will need to modify the collection to remove it's relationship with the community - It sounds like creating a new community and a new collection and update all thesis objects with the new collection ids, but not adding them to member_ids in the collection is the most straightforward solution, if we can prove it works with a large scale.
Weiwei Shi
Digital Initiative Applications Librarian University of Alberta Libraries 2-10L Cameron Library Edmonton, Alberta, Canada T6G 2J8 Phone:(780)492-7802 Fax: (780)248-1209 Email: weiwei.shi@ualberta.ca
On Sat, Oct 17, 2015 at 1:19 PM, Weiwei Shi weiwei.shi@ualberta.ca wrote:
If we are able to remove the membership from the member_ids array in this Theses collection, we may no long have the timed out issue, as I think the cause of the timed out is because the large member_ids within the object. And then we can continue to use the existing collection/community page as the presentation of the members on the collection page doesn't depend on member_ids, but depends on solr query.
If that works, we don't need a release. But any of my attempt to access this collection from the background has timed out. So we may need to delete this collection, and create a new one, and run the rake task again to update the collection info (hasCollectionId) on all member objects, but not adding them directly to the collection.
Weiwei Shi
Digital Initiative Applications Librarian University of Alberta Libraries 2-10L Cameron Library Edmonton, Alberta, Canada T6G 2J8 Phone:(780)492-7802 Fax: (780)248-1209 Email: weiwei.shi@ualberta.ca
On Sat, Oct 17, 2015 at 11:21 AM, Peter Binkley notifications@github.com wrote:
Or, simpler, just remove the membership of all theses from the community and collection, and add text to the Community/Collection description: "To see all the theses click here", with a link to the facet search. (And develop better wording). This wouldn't require a new release.
— Reply to this email directly or view it on GitHub https://github.com/ualbertalib/HydraNorth/issues/785#issuecomment-148935624 .
Can we update the object at the Fedora level?
Weiwei Shi
Digital Initiative Applications Librarian University of Alberta Libraries 2-10L Cameron Library Edmonton, Alberta, Canada T6G 2J8 Phone:(780)492-7802 Fax: (780)248-1209 Email: weiwei.shi@ualberta.ca
On Sat, Oct 17, 2015 at 1:25 PM, Weiwei Shi weiwei.shi@ualberta.ca wrote:
And we will need to modify the collection to remove it's relationship with the community - It sounds like creating a new community and a new collection and update all thesis objects with the new collection ids, but not adding them to member_ids in the collection is the most straightforward solution, if we can prove it works with a large scale.
Weiwei Shi
Digital Initiative Applications Librarian University of Alberta Libraries 2-10L Cameron Library Edmonton, Alberta, Canada T6G 2J8 Phone:(780)492-7802 Fax: (780)248-1209 Email: weiwei.shi@ualberta.ca
On Sat, Oct 17, 2015 at 1:19 PM, Weiwei Shi weiwei.shi@ualberta.ca wrote:
If we are able to remove the membership from the member_ids array in this Theses collection, we may no long have the timed out issue, as I think the cause of the timed out is because the large member_ids within the object. And then we can continue to use the existing collection/community page as the presentation of the members on the collection page doesn't depend on member_ids, but depends on solr query.
If that works, we don't need a release. But any of my attempt to access this collection from the background has timed out. So we may need to delete this collection, and create a new one, and run the rake task again to update the collection info (hasCollectionId) on all member objects, but not adding them directly to the collection.
Weiwei Shi
Digital Initiative Applications Librarian University of Alberta Libraries 2-10L Cameron Library Edmonton, Alberta, Canada T6G 2J8 Phone:(780)492-7802 Fax: (780)248-1209 Email: weiwei.shi@ualberta.ca
On Sat, Oct 17, 2015 at 11:21 AM, Peter Binkley <notifications@github.com
wrote:
Or, simpler, just remove the membership of all theses from the community and collection, and add text to the Community/Collection description: "To see all the theses click here", with a link to the facet search. (And develop better wording). This wouldn't require a new release.
— Reply to this email directly or view it on GitHub https://github.com/ualbertalib/HydraNorth/issues/785#issuecomment-148935624 .
I'm in the office now, since I needed to be on the network to rebuild my vm (sigh). I'll see what I can do. If nothing else I should be able to delete the collection and remigrate it, I think - unless what you're finding affects the ability to delete in the rails console.
If we remove all the items from the member_ids list, it should be safe to leave the links between the community and the collection, shouldn't it?
Anyway, we'll see what's possible when the item ids are removed.
I'm on newport in tmux session "theses" if you want to have a look.
Moving forward: we know that removing the members_ids list from collections resolves the problems of collections being too slow to render. But it creates a problem of items not displaying links to their collection. And we haven't fully evaluated the consequences for adding items to collections / removing items from collections (right?), or other possible consequences within Sufia. We want a solution that works for now, knowing that it will be replaced within a few months when PCDM is implemented.
I'd like to formulate a new issue along these lines:
Anything else? @weiweishi
I think the first one is a tricky one - especially when we update Sufia and Hydra-Collections gems. We should also start to look into Hydra Works and Curation Concerns regarding how they handle collection relationships, so we are not in for big surprise when PCDM is ready.
Looking into this now, as it seems to be related to a lot of issues I'm seeing in #739
@mbarnett do you think there's anything left for this original issue that haven't been captured elsewhere?
I don't think so. Should be safe to close.
The Sufia/Hydra-Collections structure of keeping member_ids on the ActiveFedora object doesn't work for large collection such as the Theses collection. The access to the object (read or write) is timed out now, probably from loading the member_ids with 10000+ items. We need to evaluate if we can replace the behavior of member_ids with solr queries, and get rid of this array from the object all together.
We will also need a way to find out how we can access this object again.