Closed emillykkejensen closed 5 years ago
not yet i think.
can you give an example of how this is done, links or so?
I had just written a long reply, when I realized that the _parent field will be removed in favour of the join field i v6.x - so I guss it is dosen't make that much sense to look into it, as it will be removed in versions to come.
Futhermore, it looks like the 'join' field can be set from the field datatypes (not meta-fields like the _parent), which makes it possible to do today with the current mapping functions.
Sorry for not looking more closely into it before posting :)
this pkg is setup to be compatible as much as possible with all Elasticsearch versions, so i would want to add support for < & > v6 - i'll at least add egs for how to do with v6 even if no code changes, and add support for <v6
I have had a look at it my self - as I would like to include more parameters in the docs_bulk function then index, type and ids. So I have changed the make_bulk function a bit, and added some extra things to the docs_bulk_prep function (two more auguments: 'quiet' and 'meta_fields')
meta_fields accepts a named list - eg. list("_routing" = c("1", "2", "5")) and can also accept NA values
You can find the two modified functions here:
custom_make_bulk <- function(df, index, type, counter, es_ids, meta_fields = NULL, path = NULL) {
if (!is.character(counter)) {
if (max(counter) >= 10000000000) {
scipen <- getOption("scipen")
options(scipen = 100)
on.exit(options(scipen = scipen))
}
}
metadata <- paste0("\"_index\":\"", index, "\",\"_type\":\"", type, "\"")
if(!es_ids){
metadata <- sapply(counter, function(x) paste0(metadata, ",\"_id\":\"", x, "\""), USE.NAMES = FALSE)
}
if(!is.null(meta_fields)){
for(i in seq(meta_fields)){
meta_field_list <- meta_fields[[i]]
metadata <- ifelse(is.na(meta_field_list), metadata,
paste0(metadata, ",\"", names(meta_fields[i]), "\":\"", meta_field_list, "\""))
}
}
metadata <- paste0("{\"index\":{", metadata, "}}")
data <- jsonlite::toJSON(df, collapse = FALSE, na = "null", auto_unbox = TRUE)
tmpf <- if (is.null(path)) tempfile("elastic__") else path
writeLines(paste(metadata, data, sep = "\n"), tmpf)
invisible(tmpf)
}
custom_docs_bulk_prep <- function(x, index, path, type = NULL, quiet = TRUE, meta_fields = NULL,
chunk_size = 1000, doc_ids = NULL) {
if (is.null(type)) type <- index
elastic:::check_doc_ids(x, doc_ids)
es_ids <- if (!is.null(doc_ids)) FALSE else TRUE
if (is.factor(doc_ids)) doc_ids <- as.character(doc_ids)
row.names(x) <- NULL
rws <- seq_len(NROW(x))
data_chks <- split(rws, ceiling(seq_along(rws) / chunk_size))
if (!is.null(doc_ids)) {
id_chks <- split(doc_ids, ceiling(seq_along(doc_ids) / chunk_size))
} else if (elastic:::has_ids(x)) {
rws <- x$id
id_chks <- split(rws, ceiling(seq_along(rws) / chunk_size))
} else {
rws <- shift_start(rws, index, type)
id_chks <- split(rws, ceiling(seq_along(rws) / chunk_size))
}
if(!quiet) pb <- txtProgressBar(min = 0, max = length(data_chks), initial = 0, style = 3)
if(!quiet) on.exit(close(pb))
resl <- vector(mode = "list", length = length(data_chks))
for (i in seq_along(data_chks)) {
if(!quiet) setTxtProgressBar(pb, i)
resl[[i]] <- custom_make_bulk(
df = x[data_chks[[i]], , drop = FALSE],
index = index,
type = type,
counter = id_chks[[i]],
es_ids = es_ids,
meta_fields = meta_fields,
path = if (length(data_chks) > 1) adjust_path(path, i) else path
)
}
return(unlist(resl))
}
Perhaps you can use it?
thanks, i'll have a look through this
parent seems to be gone in newer versions of ES, closing
Is it possible to include the parent value when using the docs_bulk function?
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#bulk-parent