Julia Data Access Functions - should they be part of the API? #38

Closed spine-o-bot closed 3 years ago

In GitLab by @DillonJ on Jul 6, 2018, 09:50

A question regarding Fabiano's proposal.

@manuelma Would the idea be to put the Julia data access functions within the API? If so, I can see benefits, but also some drawbacks.

These are the functions that we would call within equations to access the generic data structure.

For example:

parameter_name(object_ref, time_ref, sample_ref) (returns parameter value)
relationship_class(parent_object_ref) (returns list of child object references)
relationship_class(parent_object_ref,child_object_ref) (returns relationship id)

The advantages that I can see of putting this code within the API are

that we can use them in any model, not just SPINE model. This would be very powerful I think
it would clean up the spine model code
the data access functions would be independent of the model

The disadvantages are

Each call means a new query on the database - perhaps this would impact performance? The current mechanism is that spinedata.jl downloads a complete copy of the model and subsequent references to it are quite efficient. But perhaps this can all be done inside the API? For example, you call a function, Model_initialize (or whatever) which creates the dictionary objects that subsequent calls to the API would use?

Thoughts?

Edit: thinking through a little further, the data access functions all depend a lot on Julia functionality and macro programming etc... so I'm not sure if this would work?

In GitLab by @DillonJ on Jul 6, 2018, 09:52

changed the description

In GitLab by @manuelma on Jul 6, 2018, 10:06

Well init_model() or model_initialize() should not just create a dictionary but also generate and expose the compact access functions so they become so-called 'API end-points' --this means that these functions are actually part of the API for the current session or something like that. This is what SpineModel.jl is doing now: it imports the entire database into a local dictionary and then defines and exports the access functions that become part of the SpineModel module --that's why we can call them from everywhere.

If we are able to implement the same functionality API-side I think we should celebrate with champagne or something. It's a really interesting approach.

In GitLab by @DillonJ on Jul 6, 2018, 10:16

I will supply the Champagne if we can do it :)

But was this Fabianos idea? Perhaps it hasn't been examined in this level of detail yet.

This is where the clever thinking needs to be done... the devil is in the detail as they say

In GitLab by @manuelma on Jul 6, 2018, 10:30

I don't think @fabianoP went into this level of detail? It seems new ideas like this may appear as this gets more consideration... I guess we need to realistically assess what can and cannot be done with available resources and also with respect to project goals.

In GitLab by @manuelma on Jul 8, 2018, 09:01

I think the first version of the API should provide functions such as get_all_object_classes, get_all_parameters, and so on..., so that SpineModel.jl can call those instead of directly querying the database ---and then keep doing what it's doing.

But for instance if Spine toolbox, for whatever reason, also needs to get all object classes at some point, it can also call get_all_object_classes. That's the interest of the API for me, so we do not need to write queries everywhere.

More advanced stuff like these Julia access functions should be considered for later versions of the API in my opinion.

In GitLab by @jkiviluo on Jul 8, 2018, 09:24

I would think that the functions Manuel describes above are exactly what should happen in the API.

Once they are in the API, any performance improvements that are made to the queries will impact also Spine Toolbox and other models and not just Spine Model. Also, for small models it's possible to take everything into memory and gain speed, but with larger (stochastic) models memory gets full and the API gives us a place to establish efficient functions that are aware of the system restrictions and that can still cater efficiently to the dynamic data needs when the model rolls forward. The development will be compartmentalized, which @Fabiano was aiming towards.

I think these kinds of structural decisions are very important to get right. We are not just executing a project - we are making an open source modelling environment that should have resources beyond the EU project over time (other contributors as well as other projects project partners may get). In a way, I'd like to see this as the last tool I'm developing, because it can be continuously improved and won't get stuck in a dead-end. Making separate components helps with this.

In GitLab by @fabianoP on Jul 9, 2018, 12:17

Hi all, Of course we will have these type of challenges when we will be going into details. What I usually do in this case is to start documenting the end points/methods of the API. When you @manuelma are talking about API end points such as get_all_object_classes, get_all_parameters I think is what we need. These two methods are a good starting point for the API description. I think we should start documenting these type methods on a shared document, for instance using the Wiki feature of gitlab. Each operation should describe the input parameters and the expected output. When we will be ready and agree with the description then we can start with the development and deployment in the code based with one or more tests attached. @manuelma, at this stage I wouldn't be yet concerned about the caching system. However, the main focus should be to decide how can transfer the data between components. A solution can be a JSON object but we can consider alternative solutions. I think we are just at the beginning of this re-design but the ideas commented in this thread are relevant and a good starting point. As soon as Juha is back I would arrange a telco, in the meantime we will start going a bit more in details on the components (this week from 9 to 16 of July I am not 100% working but I can still provide a contribution).

In GitLab by @manuelma on Jul 9, 2018, 12:27

@fabianoP agree 100%, just one question: what do you mean by caching system?

In GitLab by @fabianoP on Jul 10, 2018, 04:41

@manuelma usually when you have a remote server it is good practise to implement a cache layer to ensure data integrity and to reduce redundant multiple requests. The implementation strategy of such as a component depends on the application. Of course this feature will be implemented in SPINE at later, later stage but it is something to keep in mind and the component based infrastructure will help us to plug and play the cache layer without any refactoring. Just as a side note, TCP/IP and the routing protocols are designed with a cache mechanism at transport layer, however, at application layer could be useful to implement our own caching system to avoid network overheads especially when there are not evident data integrity threats.

In GitLab by @manuelma on Jul 10, 2018, 09:24

Perfect @fabianoP thanks

In GitLab by @Poncelet on Apr 24, 2019, 07:28

closed

spine-tools / SpineOpt.jl

Julia Data Access Functions - should they be part of the API? #38