I was wondering if it would be possible to optimize how PGSync updates Elasticsearch records. I put 2 different thoughts I had into these dropdown menu's. Please let me know what you think.
Only Update When Specified Columns Changed
I notice that when an unspecified column updates, it fires a re-index.
### Example
Given the following schema for table `my_table`
```ruby
# Table: my_table
# ---------------------------------------------------------------------------------------------------------
# Columns:
# id | integer | PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
# foo | text |
# bar | text |
# ---------------------------------------------------------------------------------------------------------
```
And a `schema.json` defined as:
```json
[
{
"database": "default",
"index": "my_index",
"nodes": {
"table": "my_table",
"columns": [
"foo"
]
}
}
]
```
### What is happening?
`my_index` is getting updated when column `bar` updates.
### What do I expect to happen instead?
`my_index` would not be updated when `bar` updates because it is not specified in `schema.json`.
Batch Updates Together
I notice that when a record updates multiple times, it gets re-indexed the same number of times it was updated.
### Example
Given the following schema for table `my_table`
```ruby
# Table: my_table
# ---------------------------------------------------------------------------------------------------------
# Columns:
# id | integer | PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
# foo | text |
# ---------------------------------------------------------------------------------------------------------
```
And a `schema.json` defined as:
```json
[
{
"database": "default",
"index": "my_index",
"nodes": {
"table": "my_table",
"columns": [
"foo"
]
}
}
]
```
And I perform the following query:
```sql
UPDATE my_table
SET foo = "hello"
WHERE id = 1;
```
Followed by another query (before `pgsync` runs again):
```sql
UPDATE my_table
SET foo = "world"
WHERE id = 1;
```
### What is happening?
2 insert requests are sent to `my_index` the next time `pgsync` runs.
### What do I expect to happen instead?
1 insert request is sent to `my_index` the next time `pgsync` runs. Because we are performing full document updates in the index, there is no reason to send the same document to Elasticsearch twice.
I was wondering if it would be possible to optimize how PGSync updates Elasticsearch records. I put 2 different thoughts I had into these dropdown menu's. Please let me know what you think.
Only Update When Specified Columns Changed
I notice that when an unspecified column updates, it fires a re-index. ### Example Given the following schema for table `my_table` ```ruby # Table: my_table # --------------------------------------------------------------------------------------------------------- # Columns: # id | integer | PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY # foo | text | # bar | text | # --------------------------------------------------------------------------------------------------------- ``` And a `schema.json` defined as: ```json [ { "database": "default", "index": "my_index", "nodes": { "table": "my_table", "columns": [ "foo" ] } } ] ``` ### What is happening? `my_index` is getting updated when column `bar` updates. ### What do I expect to happen instead? `my_index` would not be updated when `bar` updates because it is not specified in `schema.json`.
Batch Updates Together
I notice that when a record updates multiple times, it gets re-indexed the same number of times it was updated. ### Example Given the following schema for table `my_table` ```ruby # Table: my_table # --------------------------------------------------------------------------------------------------------- # Columns: # id | integer | PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY # foo | text | # --------------------------------------------------------------------------------------------------------- ``` And a `schema.json` defined as: ```json [ { "database": "default", "index": "my_index", "nodes": { "table": "my_table", "columns": [ "foo" ] } } ] ``` And I perform the following query: ```sql UPDATE my_table SET foo = "hello" WHERE id = 1; ``` Followed by another query (before `pgsync` runs again): ```sql UPDATE my_table SET foo = "world" WHERE id = 1; ``` ### What is happening? 2 insert requests are sent to `my_index` the next time `pgsync` runs. ### What do I expect to happen instead? 1 insert request is sent to `my_index` the next time `pgsync` runs. Because we are performing full document updates in the index, there is no reason to send the same document to Elasticsearch twice.