uncharted-aske / HMI

Apache License 2.0
1 stars 0 forks source link

Load bgraph files from s3 #148

Closed adamocarolli closed 3 years ago

adamocarolli commented 3 years ago

Description: Load bgraph nodes/edges files from S3.

Note: All changes are done in this commit: https://github.com/uncharted-aske/HMI/pull/148/commits/5f66b99d0ed131998fad03f0f7236473cda0c675 all other changes are made in other branches that are currently in PR. See:

Set-up:

RosaRomeroGomez commented 3 years ago

I was thinking about the following scenario (basically what we showed in the demo).

  1. The user goes to the Knowledge view.
  2. Searches for a doc, selects one and drilldown into it.
  3. Follows the link to the COVID-19 model from the drilldown panel, which opens up the Models view.
  4. In the Models view, the user visualizes the causal links related to the selected paper in the foreground layer and the rest in the background layer.

To support 4, we are running an automatic DOI-based query for edges. This means that the data once the user opens the Models view should be already loaded on bgraph. Since loading data on bgraph can take up to 1 min or so (depending on the internet), I was thinking we might want to load data on bgraph as soon as we launch the application? Thoughts @adamocarolli @mj3cheun ?

adamocarolli commented 3 years ago

To support 4, we are running an automatic DOI-based query for edges. This means that the data once the user opens the Models view should be already loaded on bgraph. Since loading data on bgraph can take up to 1 min or so (depending on the internet), I was thinking we might want to load data on bgraph as soon as we launch the application? Thoughts @adamocarolli @mj3cheun ?

I think this is a good idea to do before a demo, but I worry about doing this in general. Thoughts:

  1. Keeping the bio model loaded in memory throughout the app would mean that at base our app is using 300-700MB of memory. That might introduce general slow down and hard to reason memory issues unrelated to your page.
  2. Eventually we'll actually run out of memory (2GB max and we probably want to stay well below that) if we try to keep multiple bgraph instances one with each model in memory. So if we want to load multiple models we need to be able to quickly load and swap models into bgraph there is no getting around that in the end.

This does bring up a concerning point: Developing features related to bgraph (in the bio model) requires you to wait 40-60s after each change to reload the model. I think we can get around this by loading locally, but that isn't really an option in production.