pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
https://pathway.com
Other
2.84k stars 98 forks source link

[QUESTION] Why does `asof_join` line take long to spin up? #11

Open dxtrous opened 3 months ago

dxtrous commented 3 months ago

I noticed that when running in Colab building up a compute graph for an asof_join takes several seconds, regardless of table size.

In the example below, taken from API documentation, it takes 4s, then if you duplicate the line in the cell, it takes 8s, etc.

Why is this the case? Does this only happen in interactive mode?

image


# -*- coding: utf-8 -*-
"""Colab_test_asof_join.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1pWox7vvDoSuohRZ1EpWR2QDGSRSOmgsV
"""

!pip install pathway

import pathway as pw

t1 = pw.debug.table_from_markdown(
    '''
    value | event_time | __time__
      2   |      2     |     4
      3   |      5     |     6
      4   |      1     |     8
      5   |      7     |    14
'''
)
t2 = pw.debug.table_from_markdown(
    '''
    value | event_time | __time__
      42  |      1     |     2
       8  |      4     |    10
'''
)

result_join = t1.join(t2, t1.event_time ==t2.event_time, how=pw.JoinMode.LEFT).select(event_time = t1.event_time)

result_asof_join = t1.asof_join(
    t2, t1.event_time, t2.event_time, how=pw.JoinMode.LEFT
).select(
    left_value=t1.value,
    right_value=t2.value,
    left_time=t1.event_time,
    right_time=t2.event_time,
)```