iangow / se_features

Linguistic features derived from StreetEvents
1 stars 3 forks source link

Make sure indexes are in NER code #28

Closed iangow closed 4 years ago

iangow commented 4 years ago

https://github.com/iangow/se_features/blob/cb85cabf454d06528ba51d572fea03e595722693/ner/ner_indexes.sql#L3

The right time to create the index is when the table is created. Are there indexes on the current tables? Indexes dramatically improve performance.

Yvonne-Han commented 4 years ago

The right time to create the index is when the table is created.

I think we did this when creating ner_class_alt_4 and ner_class_alt_7: https://github.com/iangow/se_features/blob/90973c4c2b69ec3f66fdf1079bcb156da41ab8c4/ner/create_table.py#L24-L25

Yvonne-Han commented 4 years ago

Let me double-check whether indexes exist for our current tables:

crsp=> \d+ se_features.ner_class_alt_4;
                                        Table "se_features.ner_class_alt_4"
     Column     |           Type           | Collation | Nullable | Default | Storage  | Stats target | Description
----------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
 file_name      | text                     |           |          |         | extended |              |
 last_update    | timestamp with time zone |           |          |         | plain    |              |
 speaker_number | integer                  |           |          |         | plain    |              |
 section        | integer                  |           |          |         | plain    |              |
 context        | text                     |           |          |         | extended |              |
 ner_tags       | jsonb                    |           |          |         | extended |              |
Indexes:
    "ner_class_alt_4_file_name_last_update_speaker_number_idx" btree (file_name, last_update, speaker_number)
    "ner_class_alt_4_file_name_last_update_speaker_number_sectio_idx" btree (file_name, last_update, speaker_number, section, context)

crsp=> \d+ se_features.ner_class_alt_7;
                                        Table "se_features.ner_class_alt_7"
     Column     |           Type           | Collation | Nullable | Default | Storage  | Stats target | Description
----------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
 file_name      | text                     |           |          |         | extended |              |
 last_update    | timestamp with time zone |           |          |         | plain    |              |
 speaker_number | integer                  |           |          |         | plain    |              |
 section        | integer                  |           |          |         | plain    |              |
 context        | text                     |           |          |         | extended |              |
 ner_tags       | jsonb                    |           |          |         | extended |              |
Indexes:
    "ner_class_alt_7_file_name_last_update_speaker_number_idx1" btree (file_name, last_update, speaker_number)
    "ner_class_alt_7_file_name_last_update_speaker_number_secti_idx1" btree (file_name, last_update, speaker_number, section, context)

@iangow Does this mean it's all good?

iangow commented 4 years ago

@iangow Does this mean it's all good?

Yes. Thanks.