sigoden / dufs

A file server that supports static serving, uploading, searching, accessing control, webdav...
Other
5.45k stars 259 forks source link

PROPFIND method support infinity depth #387

Closed xiaozhuai closed 2 months ago

xiaozhuai commented 2 months ago

Specific Demand

PROPFIND methid can provide a depth parameter in the header, it accept 0, 1, infinity:

Please support infinity depth.

sigoden commented 2 months ago

Are there any common usage scenarios for this feature? dufs is a lightweight web server and does not promise to support all webdav specifications (for example, it does not support lock/unlock). This feature can have a significant impact on server performance. We are not prepared to support if it is not necessary. hope you can understand.

xiaozhuai commented 2 months ago

Are there any common usage scenarios for this feature?

I use dufs as cache server for vcpkg. But as time goes by, more and more files will be cached, taking up more and more space. So I need to scan all cache files regularly and remove expired files. This requires being able to list all files in a directory recursively.

Of course, I could propfind one directory at a time and call it recursively until I get all the files. But the problem is that there are a large number of requests between the client and the server. If infinity depth is supported, the result can be obtained in one request.

This feature can have a significant impact on server performance. We are not prepared to support if it is not necessary.

I understand. I wonder if we could make this an optional allow-propfind-infinity feature? This feature is actually very important for those who need to list files recursively.

sigoden commented 2 months ago

Why I do not support this feature?

  1. PROPFIND does not support stream output. All results are cached in the memory before being output. Will the memory explode?
  2. Dufs is stateless, unlike other services that have databases. To generate a list of all files, dufs have to traverse all the files recursively, How abount the the performance?
  3. A configuration item has a learning cost, which increases the burden on other users who do not need it at all.
  4. The usage scenario is niche

Suggestion

You can write a python script to generate a file list of all subdirectories in a directory and export it as manifest.json. Set up the appropriate crontab task according to your needs. Create a special directory in dufs to serve the manifest.json, and all problems will be solved.

Ask gpt to write the script:

import os
import json

def generate_file_list(directory):
  """
  Generates a list of files in all subdirectories of a given directory
  and exports it in JSON format.

  Args:
    directory: The path to the directory to list files in.

  Returns:
    A JSON string containing the file list.
  """
  file_list = {}
  for root, dirs, files in os.walk(directory):
    for file in files:
      file_path = os.path.join(root, file)
      relative_path = os.path.relpath(file_path, directory)
      file_list[relative_path] = {
        "size": os.path.getsize(file_path),
        "modified": os.path.getmtime(file_path)
      }
  return json.dumps(file_list, indent=2)

if __name__ == "__main__":
  directory_path = "/path/to/your/directory"  # Replace this with your directory path
  json_output = generate_file_list(directory_path)
  print(json_output)

@xiaozhuai

xiaozhuai commented 2 months ago

I switch to nginx, it also didn't support propfind infinity. And I've update my script, now it support list files in parallel. Total file count is 1435. Here is a benchmark result. The parallel num is 1, 4, 8, 32.

Under dufs, increasing parallel num does not give good gains, but the nginx did.

And when the parallel num reaches a certain value(>= 8), the performance drops sharply.

For nginx, although there will be no significant benefit after the parallel num is greater than 8, there will be no performance degradation.

nginx

image

dufs

image
xiaozhuai commented 2 months ago

Here is the code snippets I use to get files recursively

    async readdir(dir) {
        const files = await this.client.getDirectoryContents(dir); // Perform PROPFIND request with depth = 1
        const subFiles = await Throttle.all(files
            .filter(file => file.type === 'directory')
            .map(file => async () => {
                return await this.readdir(file.filename);
            }), {maxInProgress: 8}); // parallel num
        return files.concat(...subFiles);
    }
sigoden commented 2 months ago

I will research the performance issues later.

We do not currently plan to support infinite depth, so we are closing this issue.