ucd-library / aggie-experts

Publicly reported feedback and issues for Aggie Experts
https://ucd-library.github.io/aggie-experts/
MIT License
1 stars 2 forks source link

Expert's documents can get ridiculously large #487

Open qjhart opened 2 months ago

qjhart commented 2 months ago

Example: https://experts.ucdavis.edu/expert/48xkGvFK

This the API file is 73M, On my speedy machine this takes ~10s to load. This is the bulk of the time that it uses to load.

There are a number of issues at play here. First, a considerable amount of this data are the 1000s of additional authors that exist for each citation, originally we had an additional modification to experts cdl where we would stop authors at 40, (but add the last author). I'm a little bit conflicted on the use of this. Another user (say for example the author) might be interested in seeing all the authors for some specific reason.

Another issue is that in most circumstances we are looking at very little of the expert.

If we followed the idea from Fedora, we could add some additional representations on our Prefer header, and make some additional limitations on these components. We could limit the page and count, and we could even trim authors from our display.

Proposed API Updates

qjhart commented 1 month ago

@UcDust this looks good.

qjhart commented 1 month ago

@UcDust, I think maybe the easiest thing to do is refactor the sanitize as something like subselect and accept

expert.subselect(doc, { sanitize:true,
  expert:true|false
  grants:{ page:1,size:25},
  works:null,
  })

This doesn't really match the API calls though, but it does allows us to get the default page with:

expert.subselect(doc,{sanitize:true,
  expert:true|false
  grants:{ page:1,size:25},
  works:{page:1,size:25}
  });

I do see a problem in that the counts will be affected on the sanitization step. so your cache would have to include both. This is one reason to not have the server guess that I suppose.

UcDust commented 1 month ago

@UcDust this looks good.

  • [ ] For the works/grants endpoints, Do we have a method to retrieve all the results? (or just size=100000? )
  • [ ] how do we want to see the total counts for grants and citations on the expert page?

@qjhart To retrieve all results, what if we added another param for ?full or ?all that would return all grants/works for that expert? Just using a huge size could work too. For total counts, could we have a structure like:

hits: {
    works: {
        total: 27,
        visible: 24
    },
    grants: {
        total: 7,
        visible: 4
    }
}

(not sure on the hits verbiage, but something along those lines maybe?)

UcDust commented 4 weeks ago

@qjhart I created the https://github.com/ucd-library/aggie-experts/compare/dc-api-subselect branch with a start to the sanitize logic changes.

We'll need to optimize more once we analyze the type of sorting we can do on grants/works, and the client needs to be wired in still.

Also, admin mode (and for users own profile) is sending the ?no-sanitize flag still, which bypasses this logic. So we'll need to think of an approach there, perhaps removing that.