joomla / joomla-cms

Home of the Joomla! Content Management System
https://www.joomla.org
GNU General Public License v2.0
4.74k stars 3.64k forks source link

Improve Zip::extractNative method to overcome server max open files restriction #40105

Open jreviews opened 1 year ago

jreviews commented 1 year ago

Steps to reproduce the issue

Install a package or extension with thousands of files in a server with a ulimit (max number of open file descriptors) setting that's lower than the number of files.

Expected result

Successful installation

Actual result

Failed installation

System information (as much as possible)

Run in shell:

ulimit -Hn

Result (example): 4000 (typical of shared hosting in Siteground and they want to charge users over $100/month for cloud server to change it)

Actual number of files in package is higher than 4000

Additional comments

The following method is a replacement for Joomla's unzip method to overcome a server's ulimit setting that prevents large packages from unzipping.

The original method can be found at in the file below and you need to open that file and replace the extractNative function with the one below.

The new function is better to overcome the server restrictions on the number of simultaneous open files because it doesn't try to extract based on the number of files in the zip archive that could cause issues with ulimit; instead, it extracts everything to a temporary folder using the extractTo method, which reduces the number of simultaneous open files.

Once extraction is completed, it loops over each file in the zip archive and moves files one by one from the temporary directory to the final destination. This approach has good performance because it solves the issue by reducing the maximum number of open files simultaneously, irrespective of the number of files in the zip archive.

This function also ensures that the parent directory exists before trying to create a new file, which could throw an error if the directory does not exist already, and then it moves each file from the temporary location to the destination directory.

Lastly, after all files have been extracted, this function cleans up any files remaining in the temporary directory, that won't be needed anymore after the extraction process.

protected function extractNative($archive, $destination)
{
  $zip = new \ZipArchive;

  if ($zip->open($archive) !== true)
  {
    throw new \RuntimeException('Unable to open archive');
  }

  // Make sure the destination folder exists
  if (!Folder::create($destination))
  {
    throw new \RuntimeException('Unable to create destination folder ' . \dirname($destination));
  }

  $tempDir = sys_get_temp_dir().'/'.uniqid('extract_', true);

  // don't try to extract based on number of files since this causes ulimit issues 
  // instead extract everything to a temporary folder then move to the final location
  $result = $zip->extractTo($tempDir);

  if (!$result)
  {
     throw new \RuntimeException('Unable to extract ZIP contents.');
  }

  for ($i=0; $i < $zip->numFiles; $i++)
  {
    $file = $zip->getNameIndex($i);

    if (substr($file, -1) === '/')
    {
      continue;
    }

    $source = $tempDir.'/'.$file;
    $target = $destination.'/'.$file;

    // Ensure parent dir exists
    if (!Folder::create(dirname($target)))
    {
      throw new \RuntimeException('Unable to create directory ' . dirname($target));
    }

    // Move file from temporary location to destination
    if (rename($source, $target) === false)
    {
      throw new \RuntimeException('Unable to move temporary file to destination ' . $target);
    }
  }

  // Clean up temp folder
  Folder::delete($tempDir);

  return true;
}
brianteeman commented 1 year ago

seems to me that you just described a reason to not have cheap hosting (my host $ ulimit -Hn 262144

jreviews commented 1 year ago

I agree 100% with you and I also don't personally have this problem.

Having said that, if there's a way to overcome the limitation which can help those that simply cannot afford to pay more and currently use a host like SiteGround, is a change like the proposed one something that would be considered?