pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
https://pathway.com
Other
2.84k stars 98 forks source link

[Bug]: "AttributeError: 'DataFrame' object has no attribute 'map'" when running Live Data Jupyter notebook. #42

Closed olruas closed 2 months ago

olruas commented 2 months ago

Steps to reproduce

Run the Live Data Jupyter notebook. The graph will get updated but at some point it will crash.

Relevant log output

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

[<ipython-input-9-09a87c3beb30>](https://localhost:8080/#) in <cell line: 1>()
----> 1 pw.run()

7 frames

[/usr/local/lib/python3.10/dist-packages/pathway/internals/runtime_type_check.py](https://localhost:8080/#) in with_type_validation(*args, **kwargs)
     17         """
     18         try:
---> 19             return beartype.beartype(f)(*args, **kwargs)
     20         except beartype.roar.BeartypeCallHintParamViolation as e:
     21             raise TypeError(e) from None

<@beartype(pathway.internals.run.run) at 0x79e30a1a7880> in run(__beartype_func, __beartype_conf, __beartype_get_violation, __beartype_object_96379127938720, __beartype_object_134015854386560, __beartype_object_134015838820288, __beartype_object_134015997294208, __beartype_object_96379063192384, *args, **kwargs)

[/usr/local/lib/python3.10/dist-packages/pathway/internals/run.py](https://localhost:8080/#) in run(debug, monitoring_level, with_http_server, default_logging, persistence_config, runtime_typechecking, license_key, terminate_on_error)
     49         terminate_on_error=terminate_on_error,
     50         _stacklevel=4,
---> 51     ).run_outputs()
     52 
     53 

[/usr/local/lib/python3.10/dist-packages/pathway/internals/graph_runner/__init__.py](https://localhost:8080/#) in run_outputs(self, after_build)
    116         after_build: Callable[[ScopeState, OperatorStorageGraph], None] | None = None,
    117     ) -> None:
--> 118         self.run_nodes(self._graph.global_scope.output_nodes, after_build=after_build)
    119 
    120     def has_bounded_input(self, table: table.Table) -> bool:

[/usr/local/lib/python3.10/dist-packages/pathway/internals/graph_runner/__init__.py](https://localhost:8080/#) in run_nodes(self, nodes, after_build)
     92     ):
     93         all_nodes = self._tree_shake(self._graph.global_scope, nodes)
---> 94         self._run(all_nodes, after_build=after_build)
     95 
     96     def run_tables(

[/usr/local/lib/python3.10/dist-packages/pathway/internals/graph_runner/__init__.py](https://localhost:8080/#) in _run(self, nodes, output_tables, after_build, run_all)
    196             ):
    197                 try:
--> 198                     return api.run_with_new_graph(
    199                         logic,
    200                         event_loop=event_loop,

[/usr/local/lib/python3.10/dist-packages/pathway/internals/table_subscription.py](https://localhost:8080/#) in on_change_wrapper(key, values, time, diff)
    132             row[field_name] = field_value
    133 
--> 134         return on_change(key=key, row=row, time=time, is_addition=(diff == 1))
    135 
    136     table_to_datasink(

[/usr/local/lib/python3.10/dist-packages/pathway/stdlib/viz/table_viz.py](https://localhost:8080/#) in update(key, row, time, is_addition)
    137                 df = df[col_names]
    138 
--> 139                 df = df.map(_format_types)
    140 
    141                 dynamic_table.value = df

[/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py](https://localhost:8080/#) in __getattr__(self, name)
   5987         ):
   5988             return self[name]
-> 5989         return object.__getattribute__(self, name)
   5990 
   5991     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'map'

What did you expect to happen?

Running until the end without error.

Version

latest with pip install

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

None

embe-pw commented 2 months ago

It looks like we get pandas 2.0 on Google Colab, but according to docs Dataframe.map is new in 2.1

embe-pw commented 2 months ago

Manually installing newer pandas seems to work, but there is this warning:

google-colab 1.0.0 requires pandas==2.0.3, but you have pandas 2.2.2 which is incompatible.
embe-pw commented 2 months ago

It seems that this is already fixed in the repo, waiting for a release. cc @szymondudycz

olruas commented 2 months ago

Ok, the issue is fixed, waiting for the release. Meanwhile, we need to force pandas>=2.1 on Colab. Thank you!