nystudio107 / craft-retour

Retour allows you to intelligently redirect legacy URLs, so that you don't lose SEO value when rebuilding & restructuring a website
https://nystudio107.com/plugins/retour
Other
39 stars 26 forks source link

Unable to import CSV file in load balanced environment (Servd hosting) #284

Closed BenParizek closed 8 months ago

BenParizek commented 8 months ago

Describe the bug

When trying to import a CSV file on a website using Servd, the page just refreshes and nothing happens.

Servd support believes this is because Retour does not support Load Balanced environments. It seems the file gets uploaded for the mapping, but in the follow up request to import the redirects, the file is no longer available for import. Servd suggested three potential approaches to work around this limitation in load balanced environments with the premise of "don't trust the filesystem":

  1. Stick it in Craft's data cache. On any load balanced setup this will likely be a Redis data store and it's fine with you just pushing binary data into a cache key
  2. Store it in Craft's temporary uploads directory. Again on a load balanced env this should be set to an off-server Volume. It requires you to do a little bit of work with FlySystem to read/write the files instead of PHP's direct filesystem functions but it's pretty straight forward.
  3. Ask the user to specify a Volume for uploads in the plugin settings and then use that.

I've tried to update Craft's Temporary Uploads directory to use an S3 directory (which might solve the upload, but might not be the best decision if other things are using that temp folder for cache-related things). However, it doesn't fix the upload as I don't think Retour is using Craft's getStoragePath method for scenario 2 to work.

I've currently resorted to just importing the redirects in locally and rewriting the retour_static_redirects database table into the production environment. Not ideal and content editors are still blocked from importing redirects if they need to.

Servd shared this doc page with me which could be a useful reference: https://servd.host/docs/handling-runtime-created-files

To reproduce

  1. Establish a website on Servd
  2. Install Retour
  3. Create a CSV file for import
  4. Select the Retour->Redirects tab
  5. Select Import CSV File
  6. Map your fields
  7. Watch the page refresh and no redirects get imported

Expected behaviour

A CSV file of Redirects can be uploaded in a load balanced environment.

Versions

khalwat commented 8 months ago

So Retour already does try to do what you're suggesting in 1):

https://github.com/nystudio107/craft-retour/blob/develop/src/controllers/FileController.php#L116

The question would be why isn't this succeeding in this case? Is it that the exception is not being thrown, or is it that something is wrong in the logic?

Because this fallback was specifically added for the scenario you're discussing:

https://github.com/nystudio107/craft-retour/issues/204

https://github.com/nystudio107/craft-retour/pull/205

BenParizek commented 8 months ago

I'll reach out to support and see if we can determine anything else about what's happening. The only thing I'm seeing in the logs is [retour-csv-import-errors.INFO] [nystudio107\retour\Retour::init] Retour plugin loaded {"memory":3976704} when trying to import a 3 line test file.

BenParizek commented 8 months ago

I've dug into this a bit more. Several moving parts here are a bit outside my comfort zone, so here's what I can say:

In a load balanced environment, I SSH'd in and added several log statements to this block of code:

https://github.com/nystudio107/craft-retour/blob/58f611d47ae1861212397044a1166774cc3a3e39/src/controllers/FileController.php#L115-L132

The first catch statement is hit on line 115 and outputs this error:

[error] RETOUR_DEBUG: `/var/www/html/storage/runtime/temp/test.csv652fc1a98ebab5.21117489`: failed to open stream: No such file or directory.

And then proceeds to run line 123 fine, without triggering the catch on 125.

If I Craft::dd() the value of $cachedFile right after line 119, it only outputs the header of the file I'm uploading:

"Legacy URL Pattern,Redirect To,Match Type,HTTP Status,Legacy URL Match Type"

The code I've added/modified looks like this (removing the dd statement while just logging):

try {
    Craft::error("Trying create path", 'RETOUR_DEBUG');

    $csv = Reader::createFromPath($filename);
    $csv->setDelimiter(Retour::$settings->csvColumnDelimiter ?? ',');
    $headers = array_flip($csv->fetchOne(0));
} catch (\Exception $e) {
    Craft::error("Catching error", 'RETOUR_DEBUG');
    Craft::error($e->getMessage(), 'RETOUR_DEBUG');

    // If this throws an exception, try to read the CSV file from the data cache
    // This can happen on load balancer setups where the Craft temp directory isn't shared
    $cache = Craft::$app->getCache();
    $cachedFile = $cache->get($filename);

    Craft::error('Cached File String:', 'RETOUR_DEBUG');
    Craft::dd($cachedFile);

    if ($cachedFile !== false) {
        $csv = Reader::createFromString($cachedFile);
        try {
            Craft::error("Trying delimiter set", 'RETOUR_DEBUG');

            $csv->setDelimiter(Retour::$settings->csvColumnDelimiter ?? ',');
        } catch (Exception $e) {
            Craft::error("Catching delimiter error", 'RETOUR_DEBUG');
Craft::error($e->getMessage(), 'RETOUR_DEBUG');

            Craft::error($e, __METHOD__);
        }
        $headers = array_flip($csv->fetchOne(0));
        $cache->delete($filename);
    } else {
        Craft::error("Could not import ${$filename} from the file system, or the cache.", __METHOD__);
    }
}

Is it possible the cache is only being set with the header row of the imported CSV file? Or am I potentially misunderstanding what is set in cache there?