pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
https://pathway.com
Other
2.84k stars 98 forks source link

I am facng issue while getting column names #25

Closed abdul756 closed 3 months ago

abdul756 commented 3 months ago

What is your question or problem? Please describe. Am facing problem when am trying to get column names using pw.debug.compute_and_print(T1.column_names())

Describe what you would like to happen Actually i want to check if a specific column exists in Table if exists I want to carry out some logic

szymondudycz commented 3 months ago

You should just write print(T1.column_names()). pw.debug.compute_and_print is used to evaluate the contents of the Table in the static mode, and you would use it pw.debug.compute_and_print(T1).

abdul756 commented 3 months ago

How to print in streaming mode the column names?

szymondudycz commented 3 months ago

print(T1.column_names()) works in both modes.

abdul756 commented 3 months ago

Thank you so much I have one more doubt This is my input schema

class QueryInputSchema(pw.Schema):
    query: str
    user: str
    # Assuming Pathway schema doesn't support direct default value setting in this manner
    mode: str

I am using rest connector to values from a UI

      query, response_writer = pw.io.http.rest_connector(
    host=ApplicationConfig.PATHWAY_REST_CONNECTOR_HOST,
    port=ApplicationConfig.PATHWAY_REST_CONNECTOR_PORT,
    schema=QueryInputSchema,
    autocommit_duration_ms=50,
    delete_completed_queries=False,
)

I want to check if value of mode lets say "a" is equal to the column name a execute some logic how to achieve this or is there any better way to do this

szymondudycz commented 3 months ago

You can set default values in the schema using pw.column_definition, check this link: https://pathway.com/developers/user-guide/types-in-pathway/schema#defining-default-values

As to your main question, that is hard to answer, as it depends on what you want to do. Some things that may help you are:

Let me know if you have a specific logic in mind.

abdul756 commented 3 months ago

Yes for example table_1 = pw.debug.table_from_markdown(

    '''
age | ON | OFF
 10 | 2 | 3
'''
table_2 = pw.debug.table_from_markdown(

age | owner | pet 10 | Alice | dog 9 | Bob | dog | Alice | cat 7 | Bob | dog ''' )


table_3 = pw.debug.table_from_markdown(

age | owner | pet 10 | E | dog 9 | F | dog | G | cat 7 | H | dog

)


For example there are two modes ON and OFF its selected by user from front end. if the table_1 
contains a column name with OFF i want to perform  say example lowering
 the strings present in the respective column in Table 2 and if the table contains ON 
i want to merge or concat along column owner 
szymondudycz commented 3 months ago

Oh, so you want to have logic that depends on whether there is some column in the table? If so, then this couldn't be done, each table's set of columns is fixed when creating a pipeline, and doesn't depend on input data. What you could have instead, is to have a column with Optional type: https://pathway.com/developers/user-guide/types-in-pathway/datatypes/#optional-data-types, with None meaning that there is no value.

Also, in your example you have only two modes, and it seems that you don't care about values in that column, so maybe you want to have instead one column with booleans?

dxtrous commented 3 months ago

@abdul756 Would you perhaps be able to express your need / logic in any other framework or programming language?

Alternatively, have you considered using logic like this?

table_1 = pw.debug.table_from_markdown(
'''
age | flag
10 | on
 4 | off
'''
)

As @szymondudycz writes, it wouldn't be usual for frontend logic to change schema, only row contents.

abdul756 commented 3 months ago

Let me explain using pandas before that am using pathway in backend and streamlit for frontend i have used

query, response_writer = pw.io.http.rest_connector(
    host=ApplicationConfig.PATHWAY_REST_CONNECTOR_HOST,
    port=ApplicationConfig.PATHWAY_REST_CONNECTOR_PORT,
    schema=QueryInputSchema,
    autocommit_duration_ms=50,
    delete_completed_queries=False,
)

for gettng data from streamit and output i have got is

query= Column A, Column B
               ON    2
assume query is a pandas dataframe
if  query['Column A'].eq('ON')).any():
      print("ON")
else:
      print("OFF)

please help me implement this logic using pathway table

abdul756 commented 3 months ago

When i tried

  if query.mode == "Research":
      pw.io.csv.write(research, "output_stream_23.csv")

I got this errror

    raise RuntimeError("Cannot use expression as boolean.")
RuntimeError: Cannot use expression as boolean.
dxtrous commented 3 months ago

RuntimeError: Cannot use expression as boolean

The source of the error is that first argument to pw.io.csv.write must be of pw.Table type.

In case you are looking for a good way to start, running one of the existing examples in the guide (starting with https://pathway.com/developers/user-guide/introduction/installation) and exploring ways to modify it might be helpful.

I'll close this issue for now as we seem to have drifted away from your original question, please don't hesitate to reach out for programming help at discord.gg/pathway in the #get-help channel.