jayvynl / django-clickhouse-backend

Django clickhouse database backend.
MIT License
130 stars 21 forks source link

[BUG] - Package taking time to render response. #70

Open vigneshshettyin opened 11 months ago

vigneshshettyin commented 11 months ago

Describe the bug Observed a case where when the query string is long, Package is taking more time to render data. Also ran a profiler and checked that database is returning result instantly (less than 500ms) but django is taking more 30-35 seconds.

Query string length is 575316 characters, this is raw SQL length

With lower query string it works fine.

To Reproduce Trigger API with higher query string (approx more than 575316)

Expected behavior Django should return response within 1-2 seconds

Versions

image (1) image

jayvynl commented 11 months ago

Is it caused by a SELECT query filtering JSON field?

vigneshshettyin commented 11 months ago

Is it caused by a SELECT query filtering JSON field?

No, we don't have JSON field @jayvynl

vigneshshettyin commented 11 months ago

Let us know if you need more info, we can share. @jayvynl

jayvynl commented 11 months ago

@vigneshshettyin What is the query like

vigneshshettyin commented 11 months ago
WITH "cte" AS (
        SELECT "test_data"."CMO_ID",
               uniq("test_data"."CLP_ID") AS "no_clp",
               uniq("test_data"."PATIENT_ID") AS "no_clp"
          FROM "test_data"
         WHERE "test_data"."CODE" IN (..............)
         GROUP BY "test_data"."CMO_ID"
       ) SELECT "entity_ui_ch_entityagg"."id",
       "entity_ui_ch_entityagg"."address",
       "entity_ui_ch_entityagg"."city",
       "entity_ui_ch_entityagg"."dimid_c",
       "entity_ui_ch_entityagg"."dx_list",
       "entity_ui_ch_entityagg"."list_filter",
       "entity_ui_ch_entityagg"."full_name",
       "entity_ui_ch_entityagg"."cmo_id",
       "entity_ui_ch_entityagg"."dimid_m",
       "entity_ui_ch_entityagg"."state",
       "entity_ui_ch_entityagg"."data_tax",
       COALESCE("cte"."no_clp", 0) AS "no_clp",
       COALESCE("cte"."no_clp", 0) AS "no_clp"
  FROM "entity_ui_ch_entityagg"
  LEFT OUTER JOIN "cte"
    ON "entity_ui_ch_entityagg"."cmo_id" = ("cte"."CMO_ID")
 ORDER BY 12 DESC
 LIMIT 10

Pls, find the above query, where CODE is the dynamic filter (CODE eg are like PLO92, PLO393, HJO93 ...). When the number of codes are like 10,000 django app performs well. But when the number of codes are more than 40,000 we are facing this issue.

image (2)

@jayvynl

jayvynl commented 11 months ago

As shown in your profiling, clickhouse_driver.varint.read_varint take most of the execution time. The problem is not caused by this project.

  1. Check if Django queryset generate expected SQL query, check if Django gives the correct result.
  2. Test the same query with another clickhouse driver, for example: clickhouse http endpoint/clickhouse go driver. If they are significantly faster than clickhouse_driver, then your can open an issue to clickhouse_driver.
  3. I noticed that clickhouse_driver.varint.read_varint have been called by 17098 times, it is unusual because the query result is limited to 10 rows. I have tested a query of 10 rows result, the fuction is only called by 106 times.
  4. I also noticed that clickhouse_driver.varint.read_varint function spent 4ms per call in your environment. In my local test environment it only spend 0.04ms per call. This difference may be caused by network delay. Your should also take network delay into account.
vigneshshettyin commented 11 months ago

Ok, let me raise a issue ticket for https://github.com/mymarilyn/clickhouse-driver