pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
https://pathway.com
Other
2.84k stars 98 forks source link

[QUESTION] Unicode outputs being treated inconsistently in Python notebook prints? #62

Open dxtrous opened 1 week ago

dxtrous commented 1 week ago

What is your question or problem? Please describe. When you look at the outputs of unicode strings in notebooks e.g. here https://pathway.com/developers/user-guide/connect/json_type, you will see that some unicode characters get printed directly if they go through an explicit .str conversion (like "Ł", unicode 0141, which is printed as a character), while some get escaped (like "ł" which is printed as "\u0142").

Describe what you would like to happen I'm curious to learn if this is documented somewhere, and what is the intended behavior.

embe-pw commented 1 week ago

The behavior is internally consistent, but possibly confusing. What happens is that the example uses two different ways of encoding the strings – some of them are just strings and some are JSON values that are strings. str() for a string value returns it as-is, while str() for any JSON value returns its JSON representation (while trying to be ASCII-only – we could change this).