Closed lukecampbell closed 10 years ago
I've narrowed it down to something to do with the string types, and it doesn't look like out of order plays a part in the problem.
Okay, it's definitely the category type.
The error above not because of the category type. The category type causes a casting error (string to float). The error here is that the array dtype is object.
The numpy array is compressed and stored, but it is an object array. You can't really store and reconstruct objects unless the objects can be serialized/deserialized. I can code to serialize and deserialize, but typically we need to ensure that the objects conform to some interface for that. (i.e. in Java you implement Serializable). pickle might do the trick.
Have object types been handled in the past?
HDF is a binary data pipe, while Pickle is an object serialization protocol. I don't think HDF supports object serialization. We are currently doing binary serialization. We could pickle, but it requires lots of memory.
Alright, I put code in to pickle only numpy object types and the OBJECT array error goes away. Now I get an error associated with the Category type. I haven't worked with CategoryType before so I'll have to look into it a bit:
Traceback (most recent call last): File "/Users/casey/Desktop/OOI/Dev/code/casey/working/coverage-model/coverage_model/test/test_postgres_storage.py", line 326, in test_category_get_set scov.get_parameter_values().get_data() File "/Users/casey/Desktop/OOI/Dev/code/casey/working/coverage-model/coverage_model/coverage.py", line 395, in get_parameter_values return self._persistence_layer.read_parameters(param_names, time_segment, time, sort_parameter, stride_length=stride_length, fill_empty_params=fill_empty_params) File "/Users/casey/Desktop/OOI/Dev/code/casey/working/coverage-model/coverage_model/storage/parameter_persisted_storage.py", line 268, in read_parameters np_dict, function_params, rec_arr = self.get_data_products(params, time_range, time, sort_parameter, stride_length=stride_length, create_record_array=True, fill_empty_params=fill_empty_params) File "/Users/casey/Desktop/OOI/Dev/code/casey/working/coverage-model/coverage_model/storage/parameter_persisted_storage.py", line 280, in get_data_products np_dict = self._create_parameter_dictionary_of_numpy_arrays(numpy_params, function_params, stride_length=stride_length, params=dict_params) File "/Users/casey/Desktop/OOI/Dev/code/casey/working/coverage-model/coverage_model/storage/parameter_persisted_storage.py", line 401, in _create_parameter_dictionary_of_numpy_arrays npa[insert_index:end_idx] = np_data ValueError: could not convert string to float: driver_timestamp
My suggestion for category types would be to:
I believe that's how we did it in the past.
Corrected problems with commit b57c6fca572f3afe177d6e972c86cc436fcc3a0d. test_category_get_set now completes successfully
Corrected problem by storing data as python objects. If this is undesirable, we can look at options to manually serialize/deserialize the objects.
If at all possible, I would really like to avoid any serialization of python objects into the coverage model.
We ran into a lot of problems with serializing anything but primitives because the interpreter versions tweak struct attrs at the C level. Pickle is a great example, we observed issues where pickling on linux couldn't be interpreted by a mac.
I'm not sure what y'alls take on this is, but I don't know of any data type in the coverage model parameter types that would need a python object serialization.
The bug is fixed
I was testing for out of order data and came across this exception. I'm going to try to see if I can narrow down specifically what is causing it