informatics-isi-edu / ermrest

ERMrest (rhymes with "earn rest") is a general relational data storage service for web-based, data-oriented collaboration.
Apache License 2.0
3 stars 5 forks source link

Revise model retrieval api for scalability #174

Open karlcz opened 7 years ago

karlcz commented 7 years ago

The current idiom is that apps like Chaise end up retrieving the entire catalog model via GET /ermrest/catalog/N/schema so they can use model-aware techniques which require knowledge of relationships between tables.

However, as catalog models get more complex, this request becomes slower and the response larger. This is due to several factors:

The most obvious approaches to improving costs are:

  1. Revise model document syntax to be more compact
    • Requires an upgrade/transition story to deal with client compatibility
  2. Try to reduce runtime cost of current serlalizer to defer the problem
    • No change for client compatibility
    • Not clear how limited the impact on internal service code might be, nor how feasible this is
  3. Redesign the api so that clients can retrieve a subset of the model in practice
    • Requires an upgrade/transition story to deal with client compatibility
    • Might get too baroque to try to address the kinds of subsets needed by real clients

A hybrid approach for the last item above would be to just augment the current API with a list of inbound references for each table. Then, a client could walk the graph via a chain of much cheaper calls which may not be too costly with HTTP/2. This is possible because the current API already supports retrieving individual table documents instead of all schemas at once.

  1. Get central table's description (exists now)
  2. Get description for each distinct table listed in central table's outbound foreign keys (exists now)
  3. Get description for each distinct table listed in central table's inbound foreign keys (new feature)
  4. Get description for each distinct table listed in central table's alternative tables annotation (exists now)
  5. Recursively repeat the above to some depth to handle UX denormalization requirements
karlcz commented 7 years ago

@hongsudt @robes

karlcz commented 6 years ago

Just a note for any future return to this topic: any attempt to discover the correct sparse subset of the model for a given chaise application instance would also require interpretation of the new pseudo column annotations which may pull in more tables that are further from the main table being displayed.