moshe / elasticsearch_loader

A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
MIT License
399 stars 83 forks source link

List data being indexed as Text/Keyword #64

Closed antunesleo closed 5 years ago

antunesleo commented 5 years ago

When I try to index an array of Text or Keyword, the data is indexed as a Text/keyword, and not as an array. Check the csv sample and ES index below:

CSV

title year director stars genres
Inception 2010 Christopher Nolan [ "Leonardo DiCaprio", "Joseph Gordon-Levitt", "Ellen Page", "Ken Watanabe"] ["Action", "Adventure", "Sci-Fi" ]

Index

{
  "movies": {
    "aliases": {},
    "mappings": {
      "_doc": {
        "properties": {
          "director": {
            "type": "text"
          },
          "genres": {
            "type": "text"
          },
          "stars": {
            "type": "text"
          },
          "title": {
            "type": "text"
          },
          "year": {
            "type": "integer"
          }
        }
      }
    },
    "settings": {
      "index": {
        "creation_date": "1543666909596",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "KRE71VReQWeBRfKaf_DCyw",
        "version": {
          "created": "6020499"
        },
        "provided_name": "movies"
      }
    }
  }
}

Expected indexed document

      {
        "_index": "movies",
        "_type": "_doc",
        "_id": "8",
        "_score": 1,
        "_source": {
          "title": "Inception",
          "year": 2010,
          "director": "Christopher Nolan",
          "stars": [
            "Leonardo DiCaprio",
            "Joseph Gordon-Levitt",
            "Ellen Page",
            "Ken Watanabe"
          ],
          "genres": [
            "Action",
            "Adventure",
            "Sci-Fi"
          ]
        }
      }

Actual document indexed

      {
        "_index": "movies",
        "_type": "_doc",
        "_id": "jaVdwWkB3XGrwhp_3RfU",
        "_score": 1,
        "_source": {
          "director": "Christopher Nolan",
          "genres": """
[
  "Action",
  "Adventure",
  "Sci-Fi"
]
""",
          "year": "2010",
          "stars": """
[
  "Leonardo DiCaprio",
  "Joseph Gordon-Levitt",
  "Ellen Page",
  "Ken Watanabe"
]
""",
          "title": "Inception"
        }
      }
moshe commented 5 years ago

@antunesleo it's kindly expected, csv don't have array type in the spec, why you are not using json?

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity (14 days). It will be closed if no further activity occurs in the next 7 days. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has been automatically closed because it has not had recent activity (21 days). Please reopen it if you feel that the issue not resolved yet.