Closed karenjexphd closed 11 months ago
Appears to be resolved for tables in 2nd example but not those in 1st example. For C10020 we have a single true_positive but see the following in output_label_set and gt_label_set so expect 4 matches:
table_model=# select * from gt_label_set where table_name='C10020_0_0' order by top_row, left_col;
table_name | cell_id | left_col | top_row | category_name | label
------------+---------+----------+---------+---------------+--------------------------------
C10020_0_0 | 2117 | 2 | 1 | ColumnHeading | european parliament elec. 2009
C10020_0_0 | 2116 | 3 | 1 | ColumnHeading | european parliament elec. 2004
C10020_0_0 | 2115 | 4 | 1 | ColumnHeading | european parliament elec. 1999
C10020_0_0 | 2114 | 5 | 1 | ColumnHeading | european parliament elec. 1996
C10020_0_0 | 2125 | 1 | 4 | RowHeading1 | national coalition pty
C10020_0_0 | 2124 | 1 | 5 | RowHeading1 | centre pty of finland
C10020_0_0 | 2123 | 1 | 6 | RowHeading1 | social democr. pty
C10020_0_0 | 2122 | 1 | 7 | RowHeading1 | greens
C10020_0_0 | 2121 | 1 | 8 | RowHeading1 | true finns
C10020_0_0 | 2120 | 1 | 9 | RowHeading1 | swedish people's pty
C10020_0_0 | 2119 | 1 | 10 | RowHeading1 | left
C10020_0_0 | 2118 | 1 | 11 | RowHeading1 | christian democrats
(12 rows)
table_model=# select * from output_label_set where table_method='hypoparsr' and table_name='C10020_0_0' order by top_row, left_col;
table_name | table_method | cell_id | left_col | top_row | category_name | label
------------+--------------+---------+----------+---------+---------------+----------------------------------
C10020_0_0 | hypoparsr | 47382 | 1 | 1 | ColumnHeading | Party
C10020_0_0 | hypoparsr | 47383 | 2 | 1 | ColumnHeading | European Parliament elec. 2009
C10020_0_0 | hypoparsr | 47384 | 3 | 1 | ColumnHeading | European Parliament elec. 2004
C10020_0_0 | hypoparsr | 47385 | 4 | 1 | ColumnHeading | European Parliament elec. 1999
C10020_0_0 | hypoparsr | 47386 | 5 | 1 | ColumnHeading | European Parliament elec. 1996
(5 rows)
table_model=# select * from label_true_positives where table_method='hypoparsr' and table_name='C10020_0_0'; table_name | table_method | label_true_pos ------------+--------------+---------------- C10020_0_0 | hypoparsr | 1 (1 row)
Closer inspection shows there are trailing spaces in the label values identified by hypoparsr:
xxPartyxx xxEuropean Parliament elec. 2009 xx xxEuropean Parliament elec. 2004 xx xxEuropean Parliament elec. 1999 xx xxEuropean Parliament elec. 1996 xx
The is_reconcilable() function needs to be updated to ignore these
Example 1 Tables: C10020, C10039 Method: Hypoparsr
Description 1 There are 4 rows where the output label value is reconcilable with the GT label value and the row/col IDs of the output and GT row are identical, yet the label is not flagged as a true positive
Example 2 Tables: C10026, C10051, C10053, C10078, C10157, C10158, C10187
Description 2 There are 4 to 6 rows in each table where the output label value is reconcilable with the GT label value, and the col IDs of the output and GT row are identical, and the offset between the GT and the output row IDs is consistent, yet the label is not flagged as a true positive