ngageoint / elasticgeo

ElasticGeo provides a GeoTools data store that allows geospatial features from an Elasticsearch index to be published via OGC services using GeoServer.
GNU General Public License v3.0
168 stars 85 forks source link

CSV Array Format #73

Open dclemenzi opened 6 years ago

dclemenzi commented 6 years ago

Turning this feature on ElasticDataStoreFactory.ARRAY_ENCODING only returns the first element of the array. Change this code in ElasticFeatureReader:

            switch (arrayEncoding) {
                case CSV:
                    // only include first array element when using CSV array encoding
                    value = values.get(0);
                    break;
                default:
                    value = values;
            }

To this:

            switch (arrayEncoding) {
                case CSV:
                    value = StringUtils.join(values, ",");
                    break;
                default:
                    value = values;
            }

Fixes the issue of only returning the first element and works if there are 1 or more than 2 elements in the array or if the array is not numeric. However if there are exactly two elements in a numeric array; then the numeric converters when adding the values to the builder are strips the second element. The reason it works if there are more then two elements is because the converters are failing.

dclemenzi commented 6 years ago

Not a great solution but for now to get us pass this issue we decided to implement a fix in our post processing when converting the feature into a rowset.

    @Override
    public Object[] next()
    {
        Feature feature = featureIterator.next();

        Object[] row = new Object[properties.size()];

        for (int i = 0; i < properties.size(); i++)
        {
            Object value = feature.getProperty(properties.get(i)).getValue();

            if (value instanceof ArrayList)
            {
                value = StringUtils.join((ArrayList) value, ",");
            }
            else if (value instanceof String && ((String) value).startsWith("[") && ((String) value).endsWith("]"))
            {
                value = StringUtils.join(new JSONArray((String) value).iterator(), ",");
            }

            row[i] = value;
        }

        return row;
    }

One thing to point out when we disable the CSV all of or numeric arrays come back as ArrayList but our String arrays come back as String. Which is why we needed to add the 'else if' portion. Maybe this is a result of how we are indexing the data within ElasticSearch?

sjudeng commented 6 years ago

Yeah array handling for CSV/GML output has gone through a few iterations in the plugin and the current implementation is still not ideal. Originally array encoding was implemented as a string-join but that caused errors for numeric arrays when outputting GML (and maybe CSV), as I think you've encountered. Let me know if you think you have an approach that would make more sense on the plugin-side for this.

The default (non-CSV) encoding will return the actual array and is suitable for GeoJSON output if you can consume that. This could also be used in the case where you're programmatically accessing features as you'll get the property back as an array directly.

dclemenzi commented 6 years ago

Yeah the non-CSV encoding is actually preferred. We need CSV for now so our code is consistent when switching between databases. At some point we should probably make the switch. The change on our side is sufficient to get our ITs passing.