ubarsc / rios

A raster processing layer on top of GDAL
https://www.rioshome.org
GNU General Public License v3.0
15 stars 8 forks source link

cleanup in TempfileManager can fail on cluster filesystem #100

Closed rdenham closed 4 months ago

rdenham commented 5 months ago

A follow up to #93.

In rios > 2, there is a clean up step which removes temporary files and a temporary directory they are stored in. On our panasas file system, this can fail due to the presence of a .panfs* file that remains in the temporary directory when it is attempted to be removed.

Some testing shows that in most situations, there is no issue in creating and removing temporary directories in python on this filesystem. The error only occurs with the default concurrency setting, ie using:

conc = applier.ConcurrencyStyle(
    numReadWorkers=0,
    numComputeWorkers=0,
    computeWorkerKind='CW_NONE',
    computeWorkersRead=False,
    singleBlockComputeWorkers=False,
    haveSharedTemp=True,
    readBufferInsertTimeout=10,
    readBufferPopTimeout=10,
    computeBufferInsertTimeout=10,
    computeBufferPopTimeout=20,
    computeBarrierTimeout=600,
)

controls.setConcurrencyStyle(conc)

Changing numReadWorkers to 1 (or any integer), seems to prevent this occurring. Also no problem if we set controls.setTempdir to point to a non-cluster file system.

I'm not familar enough with the workings of the concurrency model to debug this further, but happy to help where I can.

neilflood commented 5 months ago

As mentioned in email, I think this problem is already solved by the changes in #98 and #99. Let me know if either you or @badmatitude were able to confirm this with real tests (I am just working on theoretical knowledge and speculation about the Panasas).