photostructure / exiftool-vendored.js

Fast, cross-platform Node.js access to ExifTool
https://photostructure.github.io/exiftool-vendored.js/
MIT License
445 stars 47 forks source link

Support for Non-stay_open and Non-singleton Mode in Cloud Function Environments? #190

Closed gowy222 closed 3 months ago

gowy222 commented 4 months ago

Hi,

I'm using exiftool-vendored in a cloud function environment (There's no need to worry about the underlying Perl dependencies, as many serverless primitives can be configured to prioritize their installation.)

and I'm facing some challenges with the current implementation. I'd like to question a feature or configuration option that allows for a simpler, stateless execution mode. Here's the context and rationale:

  1. Cloud Function Environment Characteristics:

    • Unpredictable lifecycle
    • Cold/hot start mechanisms
    • Function instance reuse varies between cloud providers
    • High concurrency scenarios
  2. Current Concerns:

    • The end() call can be problematic in this environment
    • Potential conflicts at lifecycle critical points in high concurrency scenarios
    • Difficulty in managing stay_open and singleton modes effectively
  3. Proposed Solution:

    • A configuration option for new ExifTool() that enables a simple, process-per-invocation mode
    • Each cloud function invocation would spawn a new, independent ExifTool process
    • No need for explicit end() calls or process/thread pool management
    • The process terminates naturally with the function invocation
  4. Rationale:

    • Safer and more predictable behavior in cloud environments
    • Eliminates concerns about proper resource cleanup
    • Process overhead is negligible for cloud functions running on hundreds or thousands of CPU clusters
    • Simplifies usage in serverless and highly concurrent scenarios

Would it be possible to add an option to new ExifTool() that enables this kind of straightforward, use-and-discard process mode? This would greatly simplify usage in cloud function environments and potentially other scenarios where managing long-lived processes is challenging.

If possible, could provide an initialization code sample? Thx!

mceachen commented 4 months ago

Process forking in “serverless” setups can be problematic, which in turn makes this library’s architecture problematic.

If I was going to implement this, we would completely skip batch-cluster and use the already-set-up process factory (without stay_open, as you mentioned).

I would expect that per-call latency could jump to 200ms-1.5s.

I’m not familiar with a way to simulate a serverless system within GHA, so testing it and reproducing any issues that arise would be problematic. If there is indeed a way to move forward with testing, I’d be more positive towards this feature.

mceachen commented 3 months ago

I've updated the constructors for the ReadTask and WriteTask to be public, and the .parse() methods to be public as well.

You can access the path to exiftool via await require("exiftool-vendored").exiftool.exiftoolPath(), fork the process yourself with whatever arguments you need, and pass stdout to ReadTask, and get what you're wanting.

I'm not going to add "official" support for cloudless, though, given how I don't know how to test it rigorously with GHA.