microsoft / hcsshim

Windows - Host Compute Service Shim
MIT License
576 stars 259 forks source link

Fix no such device error when getting device filesystem #2248

Closed katiewasnothere closed 2 months ago

katiewasnothere commented 2 months ago
apurv15 commented 2 months ago

With the proposed change, do we need this code anymore? Mount( ... // TODO (ambarve): add better retry logic, SCSI mounts sometimes return ENONENT or // ENXIO error if we try to open those devices immediately after mount. retry after a // few milliseconds. log.G(ctx).WithError(err).Trace("get device filesystem failed, retrying in 500ms") time.Sleep(500 * time.Millisecond) if deviceFS, err = _getDeviceFsType(source); err != nil {

katiewasnothere commented 2 months ago

With the proposed change, do we need this code anymore? Mount( ... // TODO (ambarve): add better retry logic, SCSI mounts sometimes return ENONENT or // ENXIO error if we try to open those devices immediately after mount. retry after a // few milliseconds. log.G(ctx).WithError(err).Trace("get device filesystem failed, retrying in 500ms") time.Sleep(500 * time.Millisecond) if deviceFS, err = _getDeviceFsType(source); err != nil {

@apurv15 In theory yes that additional sleep could be removed. However, I'm hesitant to remove it in case there is something else that the call Open does not accommodate and there is some other reason the device is not yet ready to be read. Does it seem reasonable to keep that addition sleep?

apurv15 commented 2 months ago

Since getDevicePath() unconditionally will open & close the device before a successful return, wouldn't we be sure that the device open succeeded? If we don't remove the sleep code now, i am not sure if we will ever revisit it since the open api replacing stat api is expected to never require us to sleep wait. But we will never know if device mount succeeded because of open wait in this code change or existing sleep wait. If timelines are not very tight for delivery of this fix, we could remove the sleep code block and hope more tests actually substantiate the need for sleep wait code block.