thephpleague / flysystem

Abstraction for local and remote filesystems
https://flysystem.thephpleague.com
MIT License
13.31k stars 826 forks source link

Flysystem V2 Metadata Calls Hammer Remote APIs #1287

Open BusterNeece opened 3 years ago

BusterNeece commented 3 years ago

Summary: In Flysystem v1, there was a getMetadata API call that could be made if you needed multiple aspects of a file/directory's metadata.

This was removed in Flysystem v2 and instead replaced by individual calls for metadata like fileSize and lastModified.

This has two significant impacts:

Currently, the only workaround for this is to completely reimplement the missing getMetadata function in application code. However, because of the newly privatized methods and properties in Flysystem 2.0, this means copying and pasting a significant amount of logic from this library into your own application code just to get access to very basic functionality like isFile, isDir, or to pull multiple pieces of metadata about a given file/directory with a single outbound API call. Please do not try to minimize the impact or scale of this as has happened in previous threads; because there are zero hooks into the existing code for the adapters (short of overriding their constructors and doing it yourself), you really do need to copy/paste a ton of existing logic just to accomplish what that logic already accomplishes, but you can't access. It is seriously frustrating and basically yields nothing but new tech debt.

I understand that in some cases, each of the metadata pieces really is pulled one-by-one and not via a unified API (i.e. the local adapter), and so a single getMetadata call would be more computationally expensive than any one of those metadata pieces by itself. However, if your application already needs all of those pieces of metadata, then you're going to be hitting all of that same computational load either way, but with this new per-method paradigm, this results in a ton of stacked-up redundant API calls that slow down the app and make poor use of system resources.

To the project maintainer(s), I would implore you to please not take the same route as was taken with previous issues of mine, summarily shooting the issue down, then tweeting about how my concenrs were invalid and how it was foolish to have brought up the issue in the first place. I am far from the only person to be voicing these concerns, and that isn't because we all collectively just "don't get it", it's because the decisions made in V2 of this project that have yielded a less flexible, more frustrating and difficult to use library, contrary to the library's own web site stating "Developer experience was top of mind when creating V2 of Flysystem".

PandCar commented 3 years ago

Also faced this problem!

It is probably possible to cache all the metadata immediately after the S3 GetObject call and implement the getLastMetadata () method that would work separately from the main functionality.

This, in turn, would reduce the retrieval of file and file data to a single request.

In the case of S3, this would lead to a minimum of overhead.

frankdejonge commented 3 years ago

To the project maintainer(s), I would implore you to please not take the same route as was taken with previous issues of mine, summarily shooting the issue down, then tweeting about how my concenrs were invalid and how it was foolish to have brought up the issue in the first place. I am far from the only person to be voicing these concerns, and that isn't because we all collectively just "don't get it", it's because the decisions made in V2 of this project that have yielded a less flexible, more frustrating and difficult to use library, contrary to the library's own web site stating "Developer experience was top of mind when creating V2 of Flysystem".

^ regarding this. From what I see you've opened 3 issues. I've supplied detailed reasons, and even concrete alternatives, to which you haven't replied. So perhaps get off your high horse. The "I am not alone argument" is nice, but for me it's not about "many" it's about what percentage of users faces what, and so far for v2 this is the first issue.

As you may or may not be aware, I maintain this library in my free time. When you open issues with this tone of voice, what do you expect to happen? I'm not interested in responding to it, so this is where I draw the line. Either you stay on topic and we can have a normal discussion, or I'm forced to shield myself from this by blocking you from interacting with my repositories.

BusterNeece commented 3 years ago

The vast majority of my original post here outlines a very specific problem that is introduced by V2 of how Flysystem internally handles metadata, and documents methods to fix it, and was summarily ignored in the response. My tone is one of concern from previous experience, but not one of vulgarity, nor of inexperience as a developer, nor of any intention beyond improving the experience for all users of this library.

In any case, while my first instinct was indeed to fork the entire repository and build and maintain a separate version on my own, I've decided that it would be more valuable to build out the extensions that are necessary for my own project to have a seamless transition from Flysystem V1 to V2 into a separate library of extensions that can be used by anyone interested in similarly expanding their own Flysystem installation: https://github.com/AzuraCast/flysystem-v2-extensions

The particular solution I took to the potential computational intensity of pulling all metadata at once was to simply make the metadata values lazy-loadable, so in my extended StorageAttributes instances, you can either provide the value for each piece of metadata directly, or provide a callable that will be resolved when that particular metadata is referenced. This strikes an excellent balance between flexibility and performance, avoiding any unnecessary performance overhead.

My extended library also adds getMetadata, isFile and isDir methods to the filesystem itself (all pulled from a single StorageAttributes-implementing getMetadata function on the adapter) and exposes those externally, which is greatly helpful to anyone looking for that information currently.

Unfortunately, due to the design decisions made in Flysystem V2, implementing these changes does involve copy/pasting a lot of existing logic into my own classes, and because it's not part of the main interface, only the adapters I've done that particular legwork for are currently supported, but it serves as an excellent jumping-off point nonetheless.

I would hope in the future to see some of these incorporated into Flysystem itself's API.

BusterNeece commented 5 months ago

Coming back to this as I find myself implementing similar functionality into a new application and realizing that I had created this issue many years ago about the same problem I find myself facing now.

Looking back, the original text of this issue is actually 100% accurate. It explains the problem very well (on remote filesystems with a single getMetadata API call, that same API call gets needlessly, and expensively, hammered if you're pulling multiple bits of data about files) and proposes a solution, which I had even implemented in my own code before suggesting it here, and found that it almost perfectly solves this problem for my use case, but involves a bit of unnecessary copy/pasting code across and is made more difficult by other factors (like FileAttributes/StorageAttributes not being interfaces but rather fixed classes).

It seems the only sticking point was my last paragraph, where I asked that this issue not be buried like other ones I had raised. Even on a later re-read, I still stand by the sentiment, and in fact I think it's a little ironic that this issue was never addressed on its merits, but was also buried just for that last line asking for it not to be.

At no point in my interactions have I been intent on being disrespectful or being on a "high horse". I'm passionate in the pursuit of better code for better developer experiences. I know a ton of people out there use this library, and if even a small fraction do the types of operations I'm doing, then an optimization like this could not just cut down significantly on network traffic and lag but might even save some folks money. I think it merits a real look.