windmill-labs / windmill

Open-source developer platform to power your entire infra and turn scripts into webhooks, workflows and UIs. Fastest workflow engine (13x vs Airflow). Open-source alternative to Retool and Temporal.
https://windmill.dev
Other
10.87k stars 528 forks source link

Flow editor becomes almost completely unresponsive after testing action to get webpage HTML bug: #3374

Open nroth-dealnews opened 8 months ago

nroth-dealnews commented 8 months ago

Describe the bug

It seems that requesting somewhat large text, even when not that significant in size (1-2MB), causes the editor to become impossible to use when you are testing a flow. I was working on a web scraping workflow to request an amazon product page, then went to develop an action to take a list of selectors and output them. However, it becomes difficult to work with the sample data because the whole application stops responding.

To reproduce

  1. Add my flow i created below
  2. Test up to the first action by inputting any amazon product page url
  3. Try doing anything in the browser window

Expected behavior

Screenshots

image

Browser information

No response

Application version

EE v1.285.2

Additional Context

Here is a chrome dev tools performance recording i started just before I entered a single character into the Summary field of the next action I was working on: https://drive.google.com/file/d/1WrZL2z-S52lMcNKX3Pb3Jhyreux6NYd6/view?usp=sharing

Here is my flow:

summary: Scrape URL
description: Given a url, we should try each method available to use to retrieve
  the data we want from the url
value:
  modules:
    - id: a
      summary: Get HTML with Fetch
      value:
        type: rawscript
        content: >-
          // Fetch-only script, no imports allowed but benefits from a dedicated
          highly efficient runtime

          export async function main(url: string, user_agent: string) {
            // "3" is the default value of example_input, it can be overriden with code or using the UI
            const res = await fetch(`${url}`, {
              headers: { "User-Agent": user_agent },
            });
            return res.text();
          }
        language: nativets
        input_transforms:
          url:
            type: javascript
            expr: flow_input.url
          user_agent:
            type: static
            value: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36
              (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
        tag: null
    - id: b
      summary: Get Selectors
      value:
        type: rawscript
        content: |-
          # import wmill
          from bs4 import BeautifulSoup

          def main(html: str, css_selector: str = ""):
              if not css_selector:
                  return {"text": html}
              else:
                  soup = BeautifulSoup(html, "html.parser")
                  matches = [el.get_text() for el in soup.select(css_selector)]
                  return {"text": "\n\n".join(matches), "matches": len(matches)}
        language: python3
        input_transforms:
          css_selector:
            type: static
          html:
            type: static
        tag: null
schema:
  $schema: https://json-schema.org/draft/2020-12/schema
  properties:
    url:
      type: string
      description: ""
      default: ""
      format: uri
  required: []
  type: object
  order:
    - url
rubenfiszel commented 8 months ago

Thanks, we will take a look during our next cycle on flow editor performance improvements