nasa / opera-sds-pcm

Observational Products for End-Users from Remote Sensing Analysis (OPERA)
Apache License 2.0
16 stars 12 forks source link

[Bug]: DISP-S1 historical batch application crashes when a batch_proc contains more than 1000 frames #921

Open philipjyoon opened 2 months ago

philipjyoon commented 2 months ago

Checked for duplicates

Yes - I've already checked

Describe the bug

There are a total of 1433 frames in DISP-S1 historical database json that we must process. If we put all those frames into a single batch_proc, ES calls fail with the error message illegal_argument_exception', 'Limit of total fields [1000] has been exceeded'

This happens because by default ES indices can have up to 1000 fields including subfields. The error happens not when the batch_proc is created but when the batch application updates the batch_proc with the state information. The state information is a large dictionary of {frame: state} and the keys of this dict are being mapped as subfields in ES. There are couple ways we can fix this:

  1. Increase that maximum field length to be 2000 for the batch_proc index only. This does work and is limited to just this one index which will not have much data. The batch app can make this change when it first runs. Very light and seamless. curl -s -XPUT http://grq:9200/batch_proc/_settings -H 'Content-Type: application/json' -d '{"index.mapping.total_fields.limit": 2000}' This could be considered ever so slightly risky because the field limit is there for a reason. But this change would only apply to this one index that will never contain any significant amount of data.

  2. If we deem option 1 to be risky, the second option is to change the data structure being used to store the frame state information to something other than a map. It can just be a list, two lists (keys and values), one big string, etc. We can still transform it into a map in the application and it's a small trivial operation so there's no downside in performance. The downside of this option is that it will impact the code in nontrivial way which requires deeper testing and time. Also, the view of the data using pcm_batch.py tool will not be as friendly to the operator.

My recommendation is option 1.

What did you expect?

nt

Reproducible steps

1.
2.
3.
...

Environment

- Version of this software [e.g. vX.Y.Z]
- Operating System: [e.g. MacOSX with Docker Desktop vX.Y]
...