amutu commented 10 years ago

imcs_dup_hash_initialize should be fixed also as last commit

amutu commented 10 years ago

thanks for your fix of the imcs_dup_hash_count,but when I try a larger data set,eg more than 5 million rows,the segment fault happens again.After debug,I find this time,the semgment fault happened at the the imcs_dup_hash_initialize,similar with the last.So I fixed it and pull the commit.

this is gdb info:

0x00007f4472db49bd in imcs_dup_hash_initialize (iterator=0x7f282c0008f0) at func.c:5833 5833 elem->count += 1; (gdb) bt

0 0x00007f4472db49bd in imcs_dup_hash_initialize (iterator=0x7f282c0008f0) at func.c:5833

1 0x00007f4472cda11e in imcs_parallel_job (worker_id=16, n_workers=24, arg=0x228fc58) at imcs.c:1725

2 0x00007f4472dc45ca in imcs_thread_pool_worker (pool=0x224c650) at threadpool.c:33

3 0x00007f4472aa87f1 in start_thread () from /lib64/libpthread.so.0

4 0x00007f4478f62ccd in clone () from /lib64/libc.so.6

(gdb) print elem $1 = (imcs_hash_elem_t *) 0x0 (gdb) l 5828 agg_elem->count = 0; 5829 agg_elem->collision = agg_hash->table[agg_index];
5830 agg_hash->table[agg_index] = agg_elem;
5831 } 5832 if (++agg_elem->count == ctx->min_occurrences) { 5833 elem->count += 1; 5834 }
5835 }
5836 }
5837 agg_hash->table_used = agg_distinct_count; (gdb) quit A debugging session is active.

knizhnik commented 10 years ago

Thank you very much for pointing this issues. Shame on me: I really do not not notice the same problem in mco_seq_dup_hash_initialize. I also merge most of your other fixes. But I do not think that it is good idea to implicitly replace min_occ <= 0 with 1. I prefer to report error in this case.

amutu commented 10 years ago

thanks,I think min_occ <= 0 report error is OK. I indeed implement the error at first,but after I test the behaviour of substring(s string,start int) function of the PostgreSQL,I find substring do not report error and just set start to 1 when start <= 0: postgres=# select substring('abcdef',1);

substring

abcdef (1 row)

postgres=# select substring('abcdef',2);

substring

bcdef (1 row)

postgres=# select substring('abcdef',3);

substring

cdef (1 row)

postgres=# select substring('abcdef',7);

substring

(1 row)

postgres=# select substring('abcdef',0);

substring

abcdef (1 row)

postgres=# select substring('abcdef',-1);

substring

abcdef (1 row)

knizhnik / imcs

fix another cs_hash_dup_count #12

0 0x00007f4472db49bd in imcs_dup_hash_initialize (iterator=0x7f282c0008f0) at func.c:5833

1 0x00007f4472cda11e in imcs_parallel_job (worker_id=16, n_workers=24, arg=0x228fc58) at imcs.c:1725

2 0x00007f4472dc45ca in imcs_thread_pool_worker (pool=0x224c650) at threadpool.c:33

3 0x00007f4472aa87f1 in start_thread () from /lib64/libpthread.so.0

4 0x00007f4478f62ccd in clone () from /lib64/libc.so.6

substring

substring

substring

substring

substring

substring