ovis-hpc / ovis

OVIS/LDMS High Performance Computing monitoring, analysis, and visualization project.
https://github.com/ovis-hpc/ovis-wiki/wiki
Other
97 stars 48 forks source link

Missing API function ldms_record_free() #1052

Open morrone opened 1 year ago

morrone commented 1 year ago

The ldms.h API is missing an ldms_record_free() function to match up with the ldms_record_alloc() function. This would be very useful to sampler implementers to allow us to cease the use of a record instance if a problem is encountered during the process of filling in metric values in the record.

tom95858 commented 1 year ago

@nichamon, I thought we had this or a similar interface...can you confirm or comment?

nichamon commented 1 year ago

@narategithub Added Narate as he was the one who implemented most of the APIs.

I checked and found that we don't have an API to free up a record instance's heap memory. The closest API we have to free a record instance is ldms_list_remove_item(). It is to remove an item from a list and free the heap memory of the item.

nichamon commented 1 year ago

I believe that we don't have such an API to prevent the case that the sampler authors call the API to destroy a record instance that has been appended to a list.

We could provide the API. It will behave as ldms_list_remove_item() if the to-be-freed record instance has been appended to a list. An alternative is to make the API return EBUSY if the record instance is in a list.

@narategithub Correct me here if adding the API may lead us down a rabbit hole.

narategithub commented 1 year ago

@nichamon historically before we have the record, elements in the list are created and appended in one go with ldms_list_append_item(). They are also removed and freed (return the memory to the heap) in one go with ldms_list_remove_item().

When the record feature is introduced, the ldms_list_append_item() was left unchanged and the ldms_record_alloc() and ldms_list_append_record() were introduced to handle the record instances because the mval does not have type information (for the primitive types). The ldms_list_remove_item() is currently remove + free for both primitive elements and record elements. If we want to separate the free() from remove(), I think we should do it for both record instances and the primitive elements in the list.

Or, maybe we should change the ldms_list_append_record() to

/**
 * \retval rec_inst The record instance to use with `ldms_record_metric_set/get`
 */
ldms_mval_t ldms_list_append_record(ldms_set_t set, ldms_mval_t lh, int rec_def_idx);

that will allocate + append in one go like the primitive elements and get rid of ldms_record_alloc()?

morrone commented 1 year ago

The combined allocate+append sounds reasonable to me.

Also, speaking of ldms_list_remove_item(), it might be good to expand the comment to mention records or something. It was not clear to me how to remove a record from a list since there is a custom ldms_list_append_record(), but no matching ldms_list_remove_record().

tom95858 commented 1 year ago

@narategithub, @nichamon, @morrone I am inclined to keep the API as designed/written and update the documentation to make it clear exactly what is being done so that it is obvious to users such as @morrone what exactly the interface does with primitive and record types. I'm not inclined to splitting the interfaces because I think it makes the code more complex and introduces the possibility of programming errors where a list element is de-allocated while there is a still a reference on the item.

tom95858 commented 1 year ago

@morrone, @narategithub, @nichamon we implemented arrays of records only because we could. I think we should consider whether or not this is wise. Arrays are fixed size and this means that users of the array have to know what an 'empty' entry means and how it is encoded, 0, for example. But I think in general the coding involved with reasonably handling arrays of records in stores, for example, is more of a challenge than lists. Please comment with your perspectives.

morrone commented 1 year ago

@narategithub, @nichamon, @morrone I am inclined to keep the API as designed/written and update the documentation to make it clear exactly what is being done so that it is obvious to users such as @morrone what exactly the interface does with primitive and record types. I'm not inclined to splitting the interfaces because I think it makes the code more complex and introduces the possibility of programming errors where a list element is de-allocated while there is a still a reference on the item.

@tom95858 if this comment is only in reference to having adding "ldms_list_remove_record()", that is fine with me. Code comments could certainly address my confusion.

@narategithub's suggested API change to ldms_list_append_record() and the associated removal of ldms_record_alloc() seems like a good one. That simplifies the API, and removes an opportunity for programming error.

morrone commented 1 year ago

@morrone, @narategithub, @nichamon we implemented arrays of records only because we could. I think we should consider whether or not this is wise. Arrays are fixed size and this means that users of the array have to know what an 'empty' entry means and how it is encoded, 0, for example. But I think in general the coding involved with reasonably handling arrays of records in stores, for example, is more of a challenge than lists. Please comment with your perspectives.

My use cases all involve lists of records at this point. The fact that I need to know a record array's length at schema creation time rather than metric set creation time makes record arrays unusable for any of the use cases that I currently have in mind.

Getting rid of the record arrays would also also eliminate an inconsistency in the API naming that was a bit confusing. There are a number of function names that start with "ldms_recordarray" but operate on distinctly different things.

These functions operate on an array of records:

  ldms_record_array_get_inst
  ldms_record_array_len

But these functions operate on an array that is inside of a record:

  ldms_record_array_get_* (except for "inst")
  ldms_record_array_set_*
tom95858 commented 1 year ago

Hi @morrone, @nichamon is going to propose an updated API that will hopefully make the API symmetric regarding how lists of primitive types and record types are handled. But we need to coordinate on the use cases (what we are doing) to ensure we get this "right" before more people are using it and we need to refactor larger sections of code.

nichamon commented 1 year ago

I am inclined to get rid of the record arrays. Lists of records can be used in place of record arrays.

Let me propose a change regarding the creation and deletion of a record according to our discussion here.

nichamon commented 1 year ago

@morrone Thank you for pointing out the API inconsistency for lists, records, and record arrays.