Open gfarm-admin opened 12 years ago
Original comment by: n-soda
Originally posted by: in reply to: #2; n-soda
Replying to n-soda:
on the 2.5 release branch,
and a workaround was added about PROBLEM-4 in r6498
r6498 is not enough, because, if it's a part of a transaction, other operation in the transaction fails as follows:
<warning> [1003506] pgsql_deadfilecopy_add: INSERT INTO DeadFileCopy (inumber, igen, hostname) VALUES ($1, $2, $3): ERROR: duplicate key value violates unique constraint "deadfilecopy_pkey"#012DETAIL: Key (inumber, igen, hostname)=(283010, 1731, rgfsd003) already exists.
<err> [1000426] pgsql_filecopy_remove: DELETE FROM FileCopy WHERE inumber = $1 AND hostname = $2: ERROR: current transaction is aborted, commands ignored until end of transaction block
<err> [1003180] db_journal_store_thread : seqnum=261574806 ope=FILECOPY_REMOVE : unknown error
<err> [1003188] failed to store to db : unknown error
<err> [1003397] gfmd is shutting down for unrecoverable error
<info> [1003405] backtrace symbols [1/6]: /opt/gfarm-2.5.6/lib/libgfarm.so.1(gfarm_log_backtrace_symbols+0x1f) [0x7f78c682ce0f]
<info> [1003405] backtrace symbols [2/6]: /opt/gfarm-2.5.6/lib/libgfarm.so.1(gfarm_log_fatal_action+0x35) [0x7f78c682f925]
<info> [1003405] backtrace symbols [3/6]: /opt/gfarm-2.5.6/lib/libgfarm.so.1(gflog_fatal_message+0x83) [0x7f78c682fbd3]
<info> [1003405] backtrace symbols [4/6]: /opt/gfarm-2.5.6/sbin/gfmd(db_journal_store_thread+0x25d) [0x4599ed]
<info> [1003405] backtrace symbols [5/6]: /lib64/libpthread.so.0() [0x3ef4e077f1]
<info> [1003405] backtrace symbols [6/6]: /lib64/libc.so.6(clone+0x6d) [0x3ef46e570d]
Original comment by: *anonymous at SourceForge
Originally posted by: in reply to: #4; n-soda
Replying to n-soda:
r6498 is not enough, because, if it's a part of a transaction, other operation in the transaction fails as follows:
r6498 was backed out in r6598,
and alternative workaround written in SQL was added in r6599.
the SQL workaround should be far safer, because it make the transaction succeed.
one caveat is that this workaround completely hides the problem.
Original comment by: *anonymous at SourceForge
Original comment by: otatebe
Diff:
Original comment by: n-soda
add "[trunk]" to the summary line, because other branches have been already fixed, the reason why it's not "[pullup-trunk}" but "[trunk]" is to investigate better fix to PROBLEM-4
Original comment by: n-soda
(i.e. GFM_PROTO_CLOSE_WRITE will be issued 200,000 times)
this makes gfmd crashes as follows:
configuration:
from reading the source code, 4 problematic cases are found as follows:
-> PROBLEM-1: if the replica is currently being created, duplicate dead_file_copy may happen later
-> PROBLEM-2: if the replica is currently being removed, duplicate dead_file_copy may happen later
-> PROBLEM-3: if the replica is currently being created, duplicate dead_file_copy may happen later
-> PROBLEM-4: if the replica is invalid, duplicate dead_file_copy may happen later
-> PROBLEM-4 (should wait until the invalid state is cleared)
Reported by: n-soda
Original Ticket: gfarm/tickets/407