This issue is simply to describe problems encountered while running a linux server under a Windows
Hypervisor at kwic.com, and how we worked around them. Any changes to how items below are
implemented should take this information into account.
Windows host doesn't keep accurate time
When the host was used to set the linux vm's clock, the latter was found to be as much as several
minutes off NTP time.
workaround: ask hosting centre to disable Hypervisor time update, and run chrony on the linux vm
to sync its clock against NTP servers
Daily clock jumps from VM pause/restart break NFS
Each day around 6:30 GMT, the linux VM is apparently paused, and when it resumes, the clock on the linux VM jumps. This used to cause various nfs timeouts, invalidating the maria DB InnoDB storage engine's file descriptors to backing store on the NFS server, and this in turn caused maria DB to have to restart the InnoDB engine. It's not clear whether this would eventually lead to DB corruption or quiet
failure of transactions, but it seemed prudent to workaround this by mounting the NFS storage using nfs3 (not nfs4) semantics. nfs3 doesn't use sessions, so these timeouts don't occur.
NFS server doesn't keep accurate time
e.g.:
$ date; touch /sgm/fake_file.txt; stat -c %y /sgm/fake_file.txt; date; rm -f /sgm/fake_file.txt
Mon Feb 12 13:43:20 UTC 2018
2018-02-12 13:43:22.088258902 +0000
Mon Feb 12 13:43:20 UTC 2018
Although the example above only shows a 2 second disparity between the sgdata.motus.org clock
and the NFS server's clock (middle line) this disparity has been observed to be as high as 35 seconds or more on occasion. This could in principle break make-based builds, if targets and source are not on
the same server, so we ensure that any such builds take place entirely on one server.
(chronyc reports the clock on sgdata.motus.org as being in good sync with NTP time:
$ chronyc tracking
Reference ID : 192.95.27.155 (192.95.27.155)
Stratum : 3
Ref time (UTC) : Mon Feb 12 13:41:17 2018
System time : 0.000074011 seconds slow of NTP time
Last offset : -0.000034850 seconds
RMS offset : 0.005153073 seconds
Frequency : 209.282 ppm fast
Residual freq : -0.001 ppm
Skew : 0.012 ppm
Root delay : 0.051235 seconds
Root dispersion : 0.021167 seconds
Update interval : 1036.8 seconds
Leap status : Normal
.sqlite files on NFS
It's nothing specific to this hosting set-up, but sqlite apparently doesn't work well with NFS's somewhat
broken file-locking semantics. In any case, performance is poor on multiple small queries when .sqlite files are stored on NFS, so all high-use .sqlite files are being instead stored on the local HD, and backed-up daily.
However, the (large) .sqlite receiver DBs are being kept on NFS because of size, and because most accesses to them are for long periods by a single process (i.e. the tag-finder and the handleExportData() function), with locks handled via the symLocks table in the server.sqlite DB, which is on the local HD.
This issue is simply to describe problems encountered while running a linux server under a Windows Hypervisor at kwic.com, and how we worked around them. Any changes to how items below are implemented should take this information into account.
Windows host doesn't keep accurate time
When the host was used to set the linux vm's clock, the latter was found to be as much as several minutes off NTP time.
Daily clock jumps from VM pause/restart break NFS
Each day around 6:30 GMT, the linux VM is apparently paused, and when it resumes, the clock on the linux VM jumps. This used to cause various nfs timeouts, invalidating the maria DB InnoDB storage engine's file descriptors to backing store on the NFS server, and this in turn caused maria DB to have to restart the InnoDB engine. It's not clear whether this would eventually lead to DB corruption or quiet failure of transactions, but it seemed prudent to workaround this by mounting the NFS storage using
nfs3
(notnfs4
) semantics.nfs3
doesn't use sessions, so these timeouts don't occur.NFS server doesn't keep accurate time
e.g.:
Although the example above only shows a 2 second disparity between the sgdata.motus.org clock and the NFS server's clock (middle line) this disparity has been observed to be as high as 35 seconds or more on occasion. This could in principle break
make
-based builds, if targets and source are not on the same server, so we ensure that any such builds take place entirely on one server.(chronyc reports the clock on sgdata.motus.org as being in good sync with NTP time:
.sqlite files on NFS
It's nothing specific to this hosting set-up, but sqlite apparently doesn't work well with NFS's somewhat broken file-locking semantics. In any case, performance is poor on multiple small queries when .sqlite files are stored on NFS, so all high-use .sqlite files are being instead stored on the local HD, and backed-up daily. However, the (large) .sqlite receiver DBs are being kept on NFS because of size, and because most accesses to them are for long periods by a single process (i.e. the tag-finder and the
handleExportData()
function), with locks handled via thesymLocks
table in theserver.sqlite
DB, which is on the local HD.