Closed amutu closed 10 years ago
thanks for your fix of the imcs_dup_hash_count,but when I try a larger data set,eg more than 5 million rows,the segment fault happens again.After debug,I find this time,the semgment fault happened at the the imcs_dup_hash_initialize,similar with the last.So I fixed it and pull the commit.
this is gdb info:
0x00007f4472db49bd in imcs_dup_hash_initialize (iterator=0x7f282c0008f0) at func.c:5833 5833 elem->count += 1; (gdb) bt
(gdb) print elem
$1 = (imcs_hash_elem_t *) 0x0
(gdb) l
5828 agg_elem->count = 0;
5829 agg_elem->collision = agg_hash->table[agg_index];
5830 agg_hash->table[agg_index] = agg_elem;
5831 }
5832 if (++agg_elem->count == ctx->min_occurrences) {
5833 elem->count += 1;
5834 }
5835 }
5836 }
5837 agg_hash->table_used = agg_distinct_count;
(gdb) quit
A debugging session is active.
Thank you very much for pointing this issues. Shame on me: I really do not not notice the same problem in mco_seq_dup_hash_initialize. I also merge most of your other fixes. But I do not think that it is good idea to implicitly replace min_occ <= 0 with 1. I prefer to report error in this case.
thanks,I think min_occ <= 0 report error is OK. I indeed implement the error at first,but after I test the behaviour of substring(s string,start int) function of the PostgreSQL,I find substring do not report error and just set start to 1 when start <= 0: postgres=# select substring('abcdef',1);
abcdef (1 row)
postgres=# select substring('abcdef',2);
bcdef (1 row)
postgres=# select substring('abcdef',3);
cdef (1 row)
postgres=# select substring('abcdef',7);
(1 row)
postgres=# select substring('abcdef',0);
abcdef (1 row)
postgres=# select substring('abcdef',-1);
abcdef (1 row)
imcs_dup_hash_initialize should be fixed also as last commit