ropensci / elastic

R client for the Elasticsearch HTTP API
https://docs.ropensci.org/elastic
Other
245 stars 58 forks source link

Is parent possible? #201

Closed emillykkejensen closed 5 years ago

emillykkejensen commented 6 years ago

Is it possible to include the parent value when using the docs_bulk function?

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#bulk-parent

sckott commented 6 years ago

not yet i think.

can you give an example of how this is done, links or so?

emillykkejensen commented 6 years ago

I had just written a long reply, when I realized that the _parent field will be removed in favour of the join field i v6.x - so I guss it is dosen't make that much sense to look into it, as it will be removed in versions to come.

Futhermore, it looks like the 'join' field can be set from the field datatypes (not meta-fields like the _parent), which makes it possible to do today with the current mapping functions.

Sorry for not looking more closely into it before posting :)

sckott commented 6 years ago

this pkg is setup to be compatible as much as possible with all Elasticsearch versions, so i would want to add support for < & > v6 - i'll at least add egs for how to do with v6 even if no code changes, and add support for <v6

emillykkejensen commented 6 years ago

I have had a look at it my self - as I would like to include more parameters in the docs_bulk function then index, type and ids. So I have changed the make_bulk function a bit, and added some extra things to the docs_bulk_prep function (two more auguments: 'quiet' and 'meta_fields')

meta_fields accepts a named list - eg. list("_routing" = c("1", "2", "5")) and can also accept NA values

You can find the two modified functions here:

custom_make_bulk <- function(df, index, type, counter, es_ids, meta_fields = NULL, path = NULL) {
  if (!is.character(counter)) {
    if (max(counter) >= 10000000000) {
      scipen <- getOption("scipen")
      options(scipen = 100)
      on.exit(options(scipen = scipen))
    }
  }

  metadata <- paste0("\"_index\":\"", index, "\",\"_type\":\"", type, "\"")

  if(!es_ids){
    metadata <- sapply(counter, function(x) paste0(metadata, ",\"_id\":\"", x, "\""), USE.NAMES = FALSE)
  }

  if(!is.null(meta_fields)){
    for(i in seq(meta_fields)){
      meta_field_list <- meta_fields[[i]]
      metadata <- ifelse(is.na(meta_field_list), metadata,
                         paste0(metadata, ",\"", names(meta_fields[i]), "\":\"", meta_field_list, "\""))
    }
  }

  metadata <- paste0("{\"index\":{", metadata, "}}")

  data <- jsonlite::toJSON(df, collapse = FALSE, na = "null", auto_unbox = TRUE)
  tmpf <- if (is.null(path)) tempfile("elastic__") else path
  writeLines(paste(metadata, data, sep = "\n"), tmpf)
  invisible(tmpf)
}
custom_docs_bulk_prep <- function(x, index, path, type = NULL, quiet = TRUE, meta_fields = NULL,
                                      chunk_size = 1000, doc_ids = NULL) {

  if (is.null(type)) type <- index
  elastic:::check_doc_ids(x, doc_ids)
  es_ids <- if (!is.null(doc_ids)) FALSE else TRUE
  if (is.factor(doc_ids)) doc_ids <- as.character(doc_ids)
  row.names(x) <- NULL
  rws <- seq_len(NROW(x))
  data_chks <- split(rws, ceiling(seq_along(rws) / chunk_size))
  if (!is.null(doc_ids)) {
    id_chks <- split(doc_ids, ceiling(seq_along(doc_ids) / chunk_size))
  } else if (elastic:::has_ids(x)) {
    rws <- x$id
    id_chks <- split(rws, ceiling(seq_along(rws) / chunk_size))
  } else {
    rws <- shift_start(rws, index, type)
    id_chks <- split(rws, ceiling(seq_along(rws) / chunk_size))
  }
  if(!quiet) pb <- txtProgressBar(min = 0, max = length(data_chks), initial = 0, style = 3)
  if(!quiet) on.exit(close(pb))
  resl <- vector(mode = "list", length = length(data_chks))
  for (i in seq_along(data_chks)) {
    if(!quiet) setTxtProgressBar(pb, i)
    resl[[i]] <- custom_make_bulk(
      df = x[data_chks[[i]], , drop = FALSE],
      index = index,
      type = type,
      counter = id_chks[[i]],
      es_ids = es_ids,
      meta_fields = meta_fields,
      path = if (length(data_chks) > 1) adjust_path(path, i) else path
    )
  }
  return(unlist(resl))
}

Perhaps you can use it?

sckott commented 6 years ago

thanks, i'll have a look through this

sckott commented 5 years ago

parent seems to be gone in newer versions of ES, closing