pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
https://pathway.com
Other
2.84k stars 98 forks source link

[WEBSITE] Bug in 'Pathway in Minutes: Quick ETL Examples' #63

Open raphaelreimann opened 2 days ago

raphaelreimann commented 2 days ago

What is your question or problem? Please describe. I was going through the User Guide on pathway.com and noticed that the source code for the 'First Realtime App with Pathway' has a bug.

alerts_table = joined_values.filter(  # joined_values is not defined
    pw.this.value > pw.this.threshold
).select(pw.this.name, pw.this.value) 

Describe what you would like to happen Between the table join and the filter step you might want to define the joined values from the joined table.

Cheers!

dxtrous commented 2 days ago

True - thanks a lot for the catch!

Did fixing the typo (changing joined_values.filter to joined_table.filter) resolve the problem for you?

We will publish a documentation fix directly.

On Sat, Jun 29, 2024, 00:09 Raphael Reimann @.***> wrote:

What is your question or problem? Please describe. I was going through the User Guide https://pathway.com/developers/user-guide/introduction/first_realtime_app_with_pathway on pathway.com and noticed that the source code for the 'First Realtime App with Pathway' has a bug.

alerts_table = joined_values.filter( # joined_values is not defined pw.this.value > pw.this.threshold ).select(pw.this.name, pw.this.value)

Describe what you would like to happen Between the table join and the filter step you might want to define the joined values from the joined table.

Cheers!

— Reply to this email directly, view it on GitHub https://github.com/pathwaycom/pathway/issues/63, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZNOKHNEIRMOS2RKCS7VVDZJXNLJAVCNFSM6AAAAABKCSMQNCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4DCMRWGM2TIMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

izulin commented 2 days ago

True - thanks a lot for the catch! Did fixing the typo (changing joined_values.filter to joined_table.filter) resolve the problem for you? We will publish a documentation fix directly.

There seems to be a typo in skipping the join().select() steps.

One solution is to write

# Joining tables on the column name
joined_table = measurements_table.join( # The left table is measurements_table (pw.left)
    thresholds_table,                   # The right table is thresholds_table (pw.right)
    pw.left.name==pw.right.name,        # The join is done on the column name of each table 
).select(
    pw.left.*,                          # All the columns of measurements are kept
    pw.right.threshold                  # The threshold column of the threshold table is kept
)

# Filtering value strictly higher than the threshold.
alerts_table = joined_table.filter(
    pw.this.value > pw.this.threshold
).select(pw.this.name, pw.this.value) # Only name and value fields are kept

another is to use the filter directly on join_result and write

# Joining tables on the column name
join_result = measurements_table.join( # The left table is measurements_table (pw.left)
    thresholds_table,                   # The right table is thresholds_table (pw.right)
    pw.left.name==pw.right.name,        # The join is done on the column name of each table 
)

# Filtering value strictly higher than the threshold.
alerts_table = join_result.filter(
    pw.this.value > pw.this.threshold
).select(pw.this.name, pw.this.value) # Only name and value fields are kept