ooici / coverage-model

BSD 2-Clause "Simplified" License
2 stars 9 forks source link

Cannot create an OBJECT array from memory buffer #168

Closed lukecampbell closed 10 years ago

lukecampbell commented 10 years ago

I was testing for out of order data and came across this exception. I'm going to try to see if I can narrow down specifically what is causing it

   ----- exception: cannot create an OBJECT array from memory buffer -----
ap/handlers/coverage/coverage_handler.py:244    data = self.get_data(cov, name, bitmask)
ap/handlers/coverage/coverage_handler.py:144    data = self.get_values(cov, name)
ap/handlers/coverage/coverage_handler.py:226    data_dict = cov.get_parameter_values(param_names=[field], fill_empty_params=True).get_data()
overage-model/coverage_model/coverage.py:395    return self._persistence_layer.read_parameters(param_names, time_segment, time, sort_parameter, fill_empty_params=fill_empty_params)
l/storage/parameter_persisted_storage.py:269    np_dict, function_params, rec_arr = self.get_data_products(params, time_range, time, sort_parameter, create_record_array=True, fill_empty_params=fill_empty_params)
l/storage/parameter_persisted_storage.py:276    associated_spans = self._get_span_dict(params, time_range, time)
l/storage/parameter_persisted_storage.py:266    return SpanTablesFactory.get_span_table_obj().get_spans(coverage_ids=self.master_manager.guid, decompressors=self.value_list)
e_model/storage/postgres_span_storage.py:102    spans.append(Span.from_json(data, decompressors))
verage-model/coverage_model/data_span.py:58     uncompressed_params[str(param)] = decompressors[param].decompress(data)
l/storage/parameter_persisted_storage.py:629    vals = base64decode(obj)
l/storage/parameter_persisted_storage.py:588    arr = np.frombuffer(base64.decodestring(loaded[1]),data_type)
2014-05-08 13:24:43,571 ERROR Dummy-167 ion.util.pydap.handlers.coverage.coverage_handler:295 Problem reading cov Simplex Coverage for 45b11959d81144f89cb86c19d6fc6b0d cannot create an OBJECT array from memory buffer
lukecampbell commented 10 years ago

I've narrowed it down to something to do with the string types, and it doesn't look like out of order plays a part in the problem.

lukecampbell commented 10 years ago

Okay, it's definitely the category type.

lukecampbell commented 10 years ago

https://github.com/caseybryant/coverage-model/pull/3 https://github.com/caseybryant/coverage-model/pull/3/files#diff-c0aa30437fbed03239cdb2aedd6b5888R304

caseybryant commented 10 years ago

The error above not because of the category type. The category type causes a casting error (string to float). The error here is that the array dtype is object.

The numpy array is compressed and stored, but it is an object array. You can't really store and reconstruct objects unless the objects can be serialized/deserialized. I can code to serialize and deserialize, but typically we need to ensure that the objects conform to some interface for that. (i.e. in Java you implement Serializable). pickle might do the trick.

Have object types been handled in the past?

caseybryant commented 10 years ago

HDF is a binary data pipe, while Pickle is an object serialization protocol. I don't think HDF supports object serialization. We are currently doing binary serialization. We could pickle, but it requires lots of memory.

caseybryant commented 10 years ago

Alright, I put code in to pickle only numpy object types and the OBJECT array error goes away. Now I get an error associated with the Category type. I haven't worked with CategoryType before so I'll have to look into it a bit:

Traceback (most recent call last): File "/Users/casey/Desktop/OOI/Dev/code/casey/working/coverage-model/coverage_model/test/test_postgres_storage.py", line 326, in test_category_get_set scov.get_parameter_values().get_data() File "/Users/casey/Desktop/OOI/Dev/code/casey/working/coverage-model/coverage_model/coverage.py", line 395, in get_parameter_values return self._persistence_layer.read_parameters(param_names, time_segment, time, sort_parameter, stride_length=stride_length, fill_empty_params=fill_empty_params) File "/Users/casey/Desktop/OOI/Dev/code/casey/working/coverage-model/coverage_model/storage/parameter_persisted_storage.py", line 268, in read_parameters np_dict, function_params, rec_arr = self.get_data_products(params, time_range, time, sort_parameter, stride_length=stride_length, create_record_array=True, fill_empty_params=fill_empty_params) File "/Users/casey/Desktop/OOI/Dev/code/casey/working/coverage-model/coverage_model/storage/parameter_persisted_storage.py", line 280, in get_data_products np_dict = self._create_parameter_dictionary_of_numpy_arrays(numpy_params, function_params, stride_length=stride_length, params=dict_params) File "/Users/casey/Desktop/OOI/Dev/code/casey/working/coverage-model/coverage_model/storage/parameter_persisted_storage.py", line 401, in _create_parameter_dictionary_of_numpy_arrays npa[insert_index:end_idx] = np_data ValueError: could not convert string to float: driver_timestamp

lukecampbell commented 10 years ago

My suggestion for category types would be to:

  1. Recognize that the parameter associated with the object array is a category type
  2. Iterate over the array elements and translate them to an int8, which is what category types were in the past, translated to ints
  3. When getting the data, translate the ints back to strings using the map in the parameter type.

I believe that's how we did it in the past.

caseybryant commented 10 years ago

Corrected problems with commit b57c6fca572f3afe177d6e972c86cc436fcc3a0d. test_category_get_set now completes successfully

Corrected problem by storing data as python objects. If this is undesirable, we can look at options to manually serialize/deserialize the objects.

lukecampbell commented 10 years ago

If at all possible, I would really like to avoid any serialization of python objects into the coverage model.

We ran into a lot of problems with serializing anything but primitives because the interpreter versions tweak struct attrs at the C level. Pickle is a great example, we observed issues where pickling on linux couldn't be interpreted by a mac.

I'm not sure what y'alls take on this is, but I don't know of any data type in the coverage model parameter types that would need a python object serialization.

lukecampbell commented 10 years ago

The bug is fixed