newest replica may be lost at file closing, if gfmd load average is too high.

oss-tsukuba / gfarm

distributed file system for large-scale cluster computing and wide-area data sharing. provides fine-grained replica location control.

Other

32 stars 12 forks source link

newest replica may be lost at file closing, if gfmd load average is too high. #419

Open gfarm-admin opened 12 years ago

gfarm-admin commented 12 years ago

this happened when the load average of gfmd is extremely high,
and a gfsd closed its network connection to gfmd
due to the network_receive_timeout condition
(i.e. the gfmd nearly hung due to the high load average),
and #420 (a site-wide network failure) happend at the same time.

If this problem happens, one of the following message numbers will be logged with "error occurred during close operation for writing" on the filesystem node:

[1003507]
[1003508]
[1003509]
[1003510]

In that case, the following operations are strongly recommended:

recover file replica from the filesystem node.
increase "network_receive_timeout" parameter

For all problems which may cause "lost all replicas", see a meta ticket #474

Reported by: n-soda

Original Ticket: gfarm/tickets/419

gfarm-admin commented 12 years ago

description modified (diff)

Original comment by: n-soda

gfarm-admin commented 12 years ago

priority changed from critical to major

a workaround is added in r6469 and r6470 on the main trunk,
and in r6471 on the 2.5 release branch as follows:

now gfsd and gfmd logs this condition (gfmd's log is not guaranteed though),
thus a system administrator can rescue the replica
(because the replica should remain with old generation number),
it's recommended that a monitoring system watches this error logs.

Original comment by: n-soda

gfarm-admin commented 12 years ago

description modified (diff)

Original comment by: n-soda

gfarm-admin commented 12 years ago

description modified (diff)

Original comment by: n-soda

gfarm-admin commented 12 years ago

description modified (diff)

Original comment by: n-soda

gfarm-admin commented 12 years ago

description modified (diff)

Original comment by: n-soda

gfarm-admin commented 12 years ago

description modified (diff)

Original comment by: n-soda

gfarm-admin commented 12 years ago

description modified (diff)

Original comment by: n-soda

gfarm-admin commented 12 years ago

description modified (diff)

Original comment by: n-soda

gfarm-admin commented 12 years ago

Replying to n-soda:

and in r6471 on the 2.5 release branch as follows:

This was released as gfarm-2.5.6

Original comment by: n-soda

gfarm-admin commented 10 years ago

Milestone: gfarm-2.5.8.4 --> gfarm-2.5.8.5

Original comment by: otatebe

gfarm-admin commented 10 years ago

Milestone: gfarm-2.5.8.5 --> gfarm-2.5.8.6

Original comment by: otatebe

gfarm-admin commented 10 years ago

Milestone: gfarm-2.5.8.6 --> gfarm-2.5.8.7

Original comment by: otatebe

gfarm-admin commented 10 years ago

Description has changed:

Diff:

Milestone: gfarm-2.5.8.7 --> gfarm-2.5.8.8

Original comment by: otatebe

gfarm-admin commented 10 years ago

Milestone: gfarm-2.5.8.8 --> gfarm-2.5.8.9

Original comment by: otatebe

gfarm-admin commented 10 years ago

Milestone: gfarm-2.5.8.9 --> gfarm-2.5.8.10

Original comment by: otatebe

gfarm-admin commented 10 years ago

Milestone: gfarm-2.5.8.10 --> gfarm-2.5.8.11

Original comment by: otatebe

gfarm-admin commented 10 years ago

Milestone: gfarm-2.5.8.11 --> gfarm-2.5.8.12

Original comment by: otatebe

gfarm-admin commented 9 years ago

Milestone: gfarm-2.5.8.12 --> gfarm-2.5.8.13

Original comment by: otatebe

gfarm-admin commented 9 years ago

Milestone: gfarm-2.5.8.13 --> gfarm-2.5.8.14

Original comment by: otatebe