opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
71 stars 16 forks source link

`base_traces` source data has some values too large for the `NUMERIC` bigquery type #2269

Open ravenac95 opened 2 weeks ago

ravenac95 commented 2 weeks ago

Which area(s) are affected? (leave empty if unsure)

No response

To Reproduce

See: https://admin-dagster.opensource.observer/runs/00c8982e-1968-406a-94fe-17c808b3566d

Describe the Bug

See: https://admin-dagster.opensource.observer/runs/00c8982e-1968-406a-94fe-17c808b3566d

Expected Behavior

We need to make manually migrate the schema for the table to use bytes or string for this column like we have for arbitrum and then convert that into a double later on in the pipeline.

ravenac95 commented 2 weeks ago

I should probably fix this and record a loom so that anyone can do these things but we likely need to transition all data to this in bigquery due to limitations with the ingestion processes. When loading the parquet files if you coerce the numeric value into BYTES it doesn't convert the numeric value into a uint256 of the bytes but instead into an ascii-encoded byte array so converting this to string is probably best.

It would be good for both @ryscheng and @Jabolol to know how to do this very manual ops-y process.

ravenac95 commented 2 weeks ago

We will need to reset the checkpoints table to this:

https://admin-dagster.opensource.observer/runs/b65f9e8a-ca33-46c6-ad47-3d12991c9428?logFileKey=zlzntsmq&selection=%22base__traces%22&logs=query%3A%22base__traces%22

9:10:13.442 AM
base__traces
INFO
Worker[8]: Last checkpoint @ TS:1724719300 JOB:e83fc112-42e2-4d3a-a295-d0779e6a954e CHK:10003
9:10:13.509 AM
base__traces
INFO
Worker[4]: Last checkpoint @ TS:1724719300 JOB:e83fc112-42e2-4d3a-a295-d0779e6a954e CHK:10003
9:10:13.569 AM
base__traces
INFO
Worker[6]: Last checkpoint @ TS:1724719300 JOB:e83fc112-42e2-4d3a-a295-d0779e6a954e CHK:10003
9:10:13.639 AM
base__traces
INFO
Worker[5]: Last checkpoint @ TS:1724719300 JOB:e83fc112-42e2-4d3a-a295-d0779e6a954e CHK:10003
9:10:13.698 AM
base__traces
INFO
Worker[9]: Last checkpoint @ TS:1724719300 JOB:e83fc112-42e2-4d3a-a295-d0779e6a954e CHK:10003
9:10:13.763 AM
base__traces
INFO
Worker[7]: Last checkpoint @ TS:1724719300 JOB:e83fc112-42e2-4d3a-a295-d0779e6a954e CHK:10003
9:10:13.824 AM
base__traces
INFO
Worker[0]: Last checkpoint @ TS:1726173327 JOB:33c1ca9f-c0d4-47d1-b7f0-3c30399dabad CHK:39169
9:10:13.894 AM
base__traces
INFO
Worker[3]: Last checkpoint @ TS:1726173327 JOB:33c1ca9f-c0d4-47d1-b7f0-3c30399dabad CHK:39169
9:10:13.958 AM
base__traces
INFO
Worker[1]: Last checkpoint @ TS:1726173327 JOB:33c1ca9f-c0d4-47d1-b7f0-3c30399dabad CHK:39169
9:10:14.020 AM
base__traces
INFO
Worker[2]: Last checkpoint @ TS:1726173327 JOB:33c1ca9f-c0d4-47d1-b7f0-3c30399dabad CHK:39169

As this was the last point it ended up succeeding