googleapis / java-bigquery

Apache License 2.0
108 stars 119 forks source link

Schema not propagated when querying for array of structs #3389

Open FredrikMeyer opened 1 month ago

FredrikMeyer commented 1 month ago

Environment details

  1. Specify the API at the beginning of the title. For example, "BigQuery: ...").
       implementation("com.google.cloud:google-cloud-bigquery:2.41.0")
  2. OS type and version: MacOS
  3. Java version: 21

Steps to reproduce

  1. Create a table with a repeated struct schema.
  2. Query the same table, try to access the nested record fields by name. This fails because no schema is attached.

Code example

Reproduced here: https://github.com/FredrikMeyer/bigquery-bug/blob/main/src/test/java/no/fredrikmeyer/MainTest.java

        var query = "select * from my_dataset.my_table";

        QueryJobConfiguration queryJobConfiguration = QueryJobConfiguration.newBuilder(query).setUseLegacySql(false).build();

        var res2 = bigQuery.query(queryJobConfiguration);

        res2.iterateAll().forEach(r -> {
            // Prints true
            System.out.println(r.hasSchema());
            // Prints false, even though the schema is defined above
            System.out.println(r.get("arr").getRepeatedValue().getFirst().getRecordValue().hasSchema());
        });

Any additional information below

When having an array of structs, the schema does not seem to be propagated to the fields. Thanks!

PhongChuong commented 1 month ago

Hi @FredrikMeyer ,

Schemas are stored in TableResult and not necessarily FieldValueList. FieldValueList is used for table rows and the schema is not assessable. The schema in FieldValueList is for indexing by field name.

To get the schema for your table, you can simply:

var res2 = bigQuery.query(queryJobConfiguration);

// Read query table schema.
System.out.println(r.hasSchema());
// Get the schema for the field "arr"
System.out.println(r.getSchema().getFields().get("arr"));

res2.iterateAll().forEach(r -> {
    // Read table row results.
});

I hope this helps,

FredrikMeyer commented 1 month ago

Thanks for the answer @PhongChuong

My main problem is that since the schema is not in the FieldValueList, I cannot get the fields by name. So now my code looks like this:

    private fun periodeFromFieldValue(periodeRecord: FieldValue) = Periode(
        // cannot do recordValue.get("fraDato") because schema is not propagated
        fraDato = LocalDate.parse(periodeRecord.recordValue[0].stringValue),
        tilDato = LocalDate.parse(periodeRecord.recordValue[1].stringValue),
    )

Which seems to rely on the order the fields appear - which seem to coincide with the order they were defined in the table schema (by accident or by construction?).

PhongChuong commented 1 month ago

The field will appear as they are defined in the table schema by construction.

If you want, it is possible to provide the schema manually providing the schema by creating FieldValueList. However, it's not pretty and probably not worth the effort.

FredrikMeyer commented 11 hours ago

// Read query table schema. System.out.println(r.hasSchema()); // Get the schema for the field "arr" System.out.println(r.getSchema().getFields().get("arr"));

res2.iterateAll().forEach(r -> { // Read table row results. });

Hi, sorry for the late reply, but is there really a method getSchema here? Here r is of type FieldValueList.