move-coop / parsons

A python library of connectors for the progressive community.
Other
255 stars 125 forks source link

[Pipelines] Design - Pipelines language should be accessible to new users #1004

Open Jason94 opened 4 months ago

Jason94 commented 4 months ago

The Pipelines project targets two user groups. One of them are new users to Python, who might be familiar with basic SQL but don't know programming. We'll call them new users It's critical to the mission of Pipelines that it's accessible to new users.

New users should be able to easily:

New users do not need to be able to perform more advanced tasks like:

To focus the conversation, below is a copy of the current example script in the pipelines branch (as of 2/27/2024). I believe it captures the surface area of functionality new users will be expected to engage with.

    clean_year = CompoundPipe(
        filter_rows("{Year} is not None"),
        convert("Year", int)
    )

    load_after_1975 = Pipeline(
        "Load after 1975",
        load_from_csv("deniro.csv"),
        clean_year(),
        filter_rows("{Year} > 1975"),
        write_csv("after_1975.csv")
    )
    split_on_1980 = Pipeline(
        "Split on 1980",
        load_from_csv("deniro.csv"),
        clean_year(),
        split_data("'gte_1980' if {Year} >= 1980 else 'lt_1980'"),
        for_streams({
            "lt_1980": write_csv("before_1980.csv"),
            "gte_1980": write_csv("after_1979.csv")
        })
    )

    save_lotr_books = Pipeline(
        "Save LOTR Books",
        load_lotr_books_from_api(),
        write_csv("lotr_books.csv")
    )

    after_1990_and_all_time = Pipeline(
        "Copy into streams test",
        load_from_csv("deniro.csv"),
        clean_year(),
        copy_data_into_streams("0", "1"),
        for_streams({
            "0": CompoundPipe(
                filter_rows("{Year} > 1990"),
                write_csv("after_1990.csv")
            )(),
            "1": write_csv("all_years.csv")
        })
    )

    dashboard = Dashboard(
        load_after_1975,
        split_on_1980,
        save_lotr_books,
        after_1990_and_all_time,
    )
    dashboard.run()