Closed lukecampbell closed 10 years ago
I believe this is resolved with commit 3af4a8d6012c173d54c4fe76f0bf04dae0e29494.
It probably is but the test is failing because the dytpes don't match up. I'm writing some changes to ingestion to coerce the dtypes of incoming granules when they don't match up from the driver/agent.
This is the error now
File "/Users/luke/Documents/Dev/code/coi2/extern/pyon/pyon/util/breakpoint.py", line 86, in wrapper
return func(*args, **kwargs)
File "/Users/luke/Documents/Dev/code/coi2/ion/processes/data/replay/replay_process.py", line 158, in _cov2granule
data_dict = coverage.get_parameter_values(param_names=parameters, stride_length=stride_time, fill_empty_params=True).get_data()
File "/Users/luke/Documents/Dev/code/coi2/extern/coverage-model/coverage_model/coverage.py", line 388, in get_parameter_values
function_params=function_params, as_record_array=as_record_array)
File "/Users/luke/Documents/Dev/code/coi2/extern/coverage-model/coverage_model/storage/parameter_persisted_storage.py", line 280, in read_parameters
np_dict, functions, rec_arr = self.get_data_products(params, time_range, time, sort_parameter, stride_length=stride_length, create_record_array=as_record_array, fill_empty_params=fill_empty_params)
File "/Users/luke/Documents/Dev/code/coi2/extern/coverage-model/coverage_model/storage/parameter_persisted_storage.py", line 293, in get_data_products
np_dict = self._create_parameter_dictionary_of_numpy_arrays(numpy_params, function_params, params=dict_params)
File "/Users/luke/Documents/Dev/code/coi2/extern/coverage-model/coverage_model/storage/parameter_persisted_storage.py", line 445, in _create_parameter_dictionary_of_numpy_arrays
npa[insert_index:end_idx] = np_data
ValueError: could not broadcast input array from shape (5) into shape (1)
I'm looking into it now but I think for these tests the array types are ill-defined...
I didn't realize but we changed the way to define the length from specifying the value encoding like 'float32,float32,float32' to inner_length. I like this, I just need to update the tests and stuff.
I'm still trying to debug this, I can't tell if it's a coverage issue or a service API issue.
I'm also going to need to add some logic into ingestion workers that:
Is the issue that you don't know up front how big the inner array will be?
It doesn't seem like coverage should decide this, but I could add logic to set the inner array size the first time data is written for that parameter.
The problem is that there are 300+ array types defined and none of them have an inner length specified. I would say that's easily 300+ hours of work to figure out that information by digging through the driver and or instrument specs. If we can spend somewhere around 20 hours to solve that problem, it would be better, I think.
I was going to add it into ingestion, but I don't know how that'll work for things like the complex coverage. With the approach I was toying with in my head, I don't know how it would work with the complex coverage.
Unfortunately, the complex coverage would also need to be updated with the inner length, and this approach immediately becomes too complicated I think.
I can turn off the validation feature, but that could have negative affects.
Currently, there is no validation that the written shape matches the shape of previously written data. On read, the arrays from the two different writes may not be able to be concatenated, or the 'wrong' sized array will be reshaped, and alignment is lost (i.e. [1,2,3,4] becomes [1,2] [3,4]) and alignment is lost.
So if there is inconsistency on write size, there will be an error on write, (what you're seeing) or an error on read. Are we willing to remove validation and hope that the received data is always consistent?
I can also implement “learn array size” logic that sets the array size on first write/use of the parameter type. Unfortunately, changing the size of the inner_length of the ArrayType doesn’t get persisted. The new size won’t get persisted without some not insignificant architectural changes.
What's the consequences for the inner_length not getting persisted?
It is effectively like the no validation case with added benefits if the coverage object stays alive for multiple writes and/or reads.
The inner length will get set the to whatever the shape is on the first data write for the in-memory coverage instance. Subsequent writes will fail if the inner length doesn't match the that of the first write.
If there is a read before a write, I won't know how to accurately size the returned array (i.e. if there are 10 records for an array of 3 element inner length, I create an array with size 30, shape (10,3)). I could probably use the shape of the first segment I come across to set the inner length and hope that all other segments are the same size in the 2nd dimension. An error would be thrown if one isn't.
@lukecampbell, Array size learning is partially implemented for Simplex Coverages including persistence. It works as long as you write before you read. I still have to implement the read before write case, but you can start testing with this restriction to see if it meets your needs.
wow, ok cool!
I'm still going through some of my parameter authoring code and making changes right now. I hope to start running all the tests soon, but for now I think I'm going to skip array type support so that we can merge sooner. And then come back and re-enable it with testing support.
Whatever you want. It's there for you when you're ready.
Luke, We did some modeling/performance testing yesterday and decided to take a different approach based on the results. Instead of trying to learn the appropriate size and reject data that doesn't fit, we are going to determine size on read. The returned array will have an inner length equal to the longest array to be merged. Arrays that have a shorter inner length will have fill values in the elements that don't exist.
I'm still working out how to let you know which values are fills, though.
This really cleans up a lot of boundary cases for us, and it prevents us from having to hide data. Can you think of any problems this might cause downstream?
that's perfect, I think.
I would recommend just using NaN for the fill values if the value encoding is a float type. That way any parameter functions will just end up trying to compute with NaNs. As for the real fill value aspect of it, we can chalk it up as a bug and move on, NaNs I think will get us to the 90% case.
Do you have an idea of when this would be read?
I hope to have it finished by this afternoon. You should have a pull request waiting for you when you get in on Monday.
Luke, pull request #192 submitted for this bug.
Looks good
:+1:
When running
ion.services.dm.test.test_dm_extended:TestDMExtended.test_array_flow_paths
the test fails with a ValueError caused by reading from the coverage.
The branch is here: https://github.com/lukecampbell/coi-services/tree/cals-submodules
You'll need to add lukecampbell repository for both the coverage model submodule and the ion-definitions submodule.
To run the test: