opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
21 stars 33 forks source link

[BUG] Existing field cannot be overriden with parse command #650

Open kt-eliatra opened 1 month ago

kt-eliatra commented 1 month ago

What is the bug? According to https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/cmd/parse.rst#example-2-override-an-existing-field, the parse command can be used to override an existing field. It doesn't work in spark ppl.

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Create table and add data
    
    CREATE TABLE test (
    name STRING, age INT, email STRING, street_address STRING
    );

INSERT INTO test VALUES ("Alice", 30, "alice@example.com", "123 Main St, Seattle"), ("Bob", 55, "bob@test.org", "456 Elm St, Portland"), ("Charlie", 65, "charlie@domain.net", "789 Pine St, San Francisco"), ("David", 19, "david@anotherdomain.com", "101 Maple St, New York");

2. Run command like

source=test | parse email '.+@(?.+)' | fields email;

3. It returns

[AMBIGUOUS_REFERENCE] Reference email is ambiguous, could be: [email, spark_catalog.default.test.email].



**What is the expected behavior?**
Values from the existing email column are overridden by values computed by the parse command.

**What is your host/environment?**
 - OS: Linux Mint

**Do you have any screenshots?**
If applicable, add screenshots to help explain your problem.

**Do you have any additional context?**
Add any other context about the problem.
salyh commented 1 month ago

Is it even possible to replace a existing column with a new computed column with same name in spark? Not sure about this.

dblock commented 1 month ago

[Catch All Triage - 1, 2, 3, 4]