rji-futures-lab / django-fcc-opif

For extracting and analyzing data from the FCC Public Inspection files
0 stars 0 forks source link

Slow queries on Cable File Admin #40

Open gordonje opened 4 years ago

gordonje commented 4 years ago

Here's what Django Debug Toolbar is showing us:

Screen Shot 2020-03-31 at 2 55 37 PM

gordonje commented 4 years ago

Here's the SQL for the slowest one:

SELECT "fcc_opif_cablefile"."file_id",
       "fcc_opif_cablefile"."file_name",
       "fcc_opif_cablefile"."file_extension",
       "fcc_opif_cablefile"."file_size",
       "fcc_opif_cablefile"."file_status",
       "fcc_opif_cablefile"."create_ts",
       "fcc_opif_cablefile"."last_update_ts",
       "fcc_opif_cablefile"."file_manager_id",
       "fcc_opif_cablefile"."moved_from",
       "fcc_opif_cablefile"."moved_ts",
       "fcc_opif_cablefile"."documentcloud_id",
       "fcc_opif_cablefile"."stored_file",
       "fcc_opif_cablefile"."folder_id",
       "fcc_opif_cablefolder"."entity_folder_id",
       "fcc_opif_cablefolder"."folder_name",
       "fcc_opif_cablefolder"."folder_path",
       "fcc_opif_cablefolder"."allow_rename_ind",
       "fcc_opif_cablefolder"."allow_subfolder_ind",
       "fcc_opif_cablefolder"."allow_upload_ind",
       "fcc_opif_cablefolder"."allow_delete_ind",
       "fcc_opif_cablefolder"."more_public_files_ind",
       "fcc_opif_cablefolder"."parent_folder_id",
       "fcc_opif_cablefolder"."file_count",
       "fcc_opif_cablefolder"."create_ts",
       "fcc_opif_cablefolder"."last_update_ts",
       "fcc_opif_cablefolder"."entity_id",
       "fcc_opif_cablesystem"."id",
       "fcc_opif_cablesystem"."legal_name",
       "fcc_opif_cablesystem"."service_type",
       "fcc_opif_cablesystem"."operator_address_line1",
       "fcc_opif_cablesystem"."operator_name",
       "fcc_opif_cablesystem"."operator_address_line2",
       "fcc_opif_cablesystem"."operator_po_box",
       "fcc_opif_cablesystem"."operator_city",
       "fcc_opif_cablesystem"."operator_zip_code",
       "fcc_opif_cablesystem"."operator_zip_code_suffix",
       "fcc_opif_cablesystem"."operator_state",
       "fcc_opif_cablesystem"."operator_email",
       "fcc_opif_cablesystem"."operator_website",
       "fcc_opif_cablesystem"."operator_phone",
       "fcc_opif_cablesystem"."operator_fax",
       "fcc_opif_cablesystem"."cores_user",
       "fcc_opif_cablesystem"."principal_headend_name",
       "fcc_opif_cablesystem"."principal_address_line1",
       "fcc_opif_cablesystem"."principal_address_line2",
       "fcc_opif_cablesystem"."principal_po_box",
       "fcc_opif_cablesystem"."principal_city",
       "fcc_opif_cablesystem"."principal_state",
       "fcc_opif_cablesystem"."principal_zip_code",
       "fcc_opif_cablesystem"."principal_zip_code_suffix",
       "fcc_opif_cablesystem"."principal_fax",
       "fcc_opif_cablesystem"."principal_phone",
       "fcc_opif_cablesystem"."principal_email",
       "fcc_opif_cablesystem"."local_file_contact_name",
       "fcc_opif_cablesystem"."local_file_address_line1",
       "fcc_opif_cablesystem"."local_file_address_line2",
       "fcc_opif_cablesystem"."local_file_po_box",
       "fcc_opif_cablesystem"."local_file_city",
       "fcc_opif_cablesystem"."local_file_state",
       "fcc_opif_cablesystem"."local_file_zip_code",
       "fcc_opif_cablesystem"."local_file_zip_code_suffix",
       "fcc_opif_cablesystem"."local_file_contact_fax",
       "fcc_opif_cablesystem"."local_file_contact_phone",
       "fcc_opif_cablesystem"."local_file_contact_email",
       "fcc_opif_cablesystem"."active_ind",
       "fcc_opif_cablesystem"."principal_address_in_local_files",
       "fcc_opif_cablesystem"."cable_service_zip_codes",
       "fcc_opif_cablesystem"."cable_service_emp_units",
       "fcc_opif_cablesystem"."cable_communities",
       "fcc_opif_cablesystem"."last_refreshed_ts"
  FROM "fcc_opif_cablefile"
 INNER JOIN "fcc_opif_cablefolder"
    ON ("fcc_opif_cablefile"."folder_id" = "fcc_opif_cablefolder"."entity_folder_id")
 INNER JOIN "fcc_opif_cablesystem"
    ON ("fcc_opif_cablefolder"."entity_id" = "fcc_opif_cablesystem"."id")
 ORDER BY "fcc_opif_cablefolder"."entity_id" ASC, "fcc_opif_cablefile"."folder_id" ASC, "fcc_opif_cablefile"."file_name" ASC, "fcc_opif_cablefile"."file_id" DESC
 LIMIT 100;

And the query plan:

Limit  (cost=915379.15..915379.40 rows=100 width=1603) (actual time=28654.485..28654.640 rows=100 loops=1)
  ->  Sort  (cost=915379.15..923316.31 rows=3174865 width=1603) (actual time=28654.482..28654.502 rows=100 loops=1)
        Sort Key: fcc_opif_cablefolder.entity_id, fcc_opif_cablefile.folder_id, fcc_opif_cablefile.file_name, fcc_opif_cablefile.file_id DESC
        Sort Method: top-N heapsort  Memory: 259kB
        ->  Gather  (cost=32383.34..794038.09 rows=3174865 width=1603) (actual time=946.838..23436.426 rows=3170718 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              ->  Hash Join  (cost=31383.34..475551.59 rows=1322860 width=1603) (actual time=828.616..18241.230 rows=1056906 loops=3)
                    Hash Cond: ((fcc_opif_cablefolder.entity_id)::text = (fcc_opif_cablesystem.id)::text)
                    ->  Hash Join  (cost=30800.51..471492.66 rows=1322860 width=916) (actual time=824.034..16578.083 rows=1056906 loops=3)
                          Hash Cond: (fcc_opif_cablefile.folder_id = fcc_opif_cablefolder.entity_folder_id)
                          ->  Parallel Seq Scan on fcc_opif_cablefile  (cost=0.00..164510.60 rows=1322860 width=785) (actual time=0.016..612.738 rows=1056906 loops=3)
                          ->  Hash  (cost=15763.45..15763.45 rows=469445 width=131) (actual time=823.100..823.101 rows=469449 loops=3)
                                Buckets: 32768  Batches: 32  Memory Usage: 2750kB
                                ->  Seq Scan on fcc_opif_cablefolder  (cost=0.00..15763.45 rows=469445 width=131) (actual time=0.009..312.741 rows=469449 loops=3)
                    ->  Hash  (cost=531.81..531.81 rows=4081 width=687) (actual time=4.494..4.494 rows=4081 loops=3)
                          Buckets: 4096  Batches: 1  Memory Usage: 2972kB
                          ->  Seq Scan on fcc_opif_cablesystem  (cost=0.00..531.81 rows=4081 width=687) (actual time=0.006..1.324 rows=4081 loops=3)
Planning time: 1.550 ms
Execution time: 28654.901 ms
gordonje commented 4 years ago

Might have something to do with the number of fields we are selecting. I tried the same query with only five fields from each of the three tables:

SELECT "fcc_opif_cablefile"."file_id",
       "fcc_opif_cablefile"."file_name",
       "fcc_opif_cablefile"."file_extension",
       "fcc_opif_cablefile"."file_size",
       "fcc_opif_cablefile"."file_status",
       "fcc_opif_cablefolder"."entity_folder_id",
       "fcc_opif_cablefolder"."folder_name",
       "fcc_opif_cablefolder"."folder_path",
       "fcc_opif_cablefolder"."allow_rename_ind",
       "fcc_opif_cablefolder"."allow_subfolder_ind",
       "fcc_opif_cablesystem"."id",
       "fcc_opif_cablesystem"."legal_name",
       "fcc_opif_cablesystem"."service_type",
       "fcc_opif_cablesystem"."operator_address_line1",
       "fcc_opif_cablesystem"."operator_name"
  FROM "fcc_opif_cablefile"
 INNER JOIN "fcc_opif_cablefolder"
    ON ("fcc_opif_cablefile"."folder_id" = "fcc_opif_cablefolder"."entity_folder_id")
 INNER JOIN "fcc_opif_cablesystem"
    ON ("fcc_opif_cablefolder"."entity_id" = "fcc_opif_cablesystem"."id")
 ORDER BY "fcc_opif_cablefolder"."entity_id" ASC, "fcc_opif_cablefile"."folder_id" ASC, "fcc_opif_cablefile"."file_name" ASC, "fcc_opif_cablefile"."file_id" DESC
 LIMIT 100;

This query 👆 took five seconds, as opposed to 26.