it-at-m / digiwf-project

Project repo for the DigiWF project automation platform.
MIT License
10 stars 1 forks source link

File Handling in Integration Artifacts #18

Closed xdoo closed 2 years ago

xdoo commented 2 years ago

We don't want to handle any kind of files inside the process engine. If we want to send or receive files over integration artifacts, we have to handle them inside the integration artifact.

Integrationskonzept für Services _ Feature Requests _ Ablaufdatum S3 (2)

Integrationskonzept für Services _ Feature Requests _ Ablaufdatum S3 (1)

We need this kind of file handling (incoming & outgoing) in many integration artifacts (Mail, JMS, DMS, PDF Generation, ...). Makes it sense to create for both ways a Spring Boot starter?

@dominikhorn93 @boal

File Handling in generated Forms

Save File(s)

image.png

So it is currently not possible to use a fileupload at process startup over a generated web form. We need a persistent process start context for this (per user). Of course files can be available if a process is started by an integration artifact like mail or jms.

Integrationskonzept für Services _ Feature Requests _ Ablaufdatum S3 (5).png

One challenge is, that we have to know in the task service which S3 Service we want to use for a user task.

Here is the same picture in an asynchronous context:

Integrationskonzept für Services _ Feature Requests _ Ablaufdatum S3 (6).png

Load File(s)

image.png

boal commented 2 years ago

Based on the discussion with @xdoo we suggest to proceed as follows:

  1. Create a Spring-Boot-Starter artifact for digiwf-s3-integration -> already exists -> https://github.com/it-at-m/digiwf-s3-integration

  2. Create a Spring-Boot-Starter artifact for digiwf-s3-integration-client. This starter provides client classes to use the above mentioned starter for saving, getting, deletion and updating of files.

    1. Create a Spring-Boot-Starter artifact for digiwf-mail-integration. This starter provides the functionality to send and receive mails. Additionally this starter uses the starter digiwf-s3-integration-client to save and get files from s3 storage.
xdoo commented 2 years ago

I've added a new issue for a spring-starter concept, to create new integration artifacts & services (see #34 )

boal commented 2 years ago

Actually, we do not need a separate repo for the "digiwf-s3-integration-client". We can also include the "digiwf-s3-integration-client" and "digiwf-s3-integration-client-starter" as another submodules within repo "digiwf-s3-integration".

As a result, the repo "digiwf-s3-integration" includes two starters. One "digiwf-s3-integration-client-starter" and a "digiwf-s3-integration-starter". The "digiwf-s3-integration-client-starter" depends on starter "digiwf-s3-integration-starter" anyway.

xdoo commented 2 years ago

Yes. It makes sense, to put all S3 stuff in one repo.

boal commented 2 years ago

Issue for implementation of digiwf-s3-integration-client-starter: https://github.com/it-at-m/digiwf-s3-integration/issues/28

xdoo commented 2 years ago

@a-m-zill

martind260 commented 2 years ago

This was my idea of handling the upload from frontend control down to DigiWF Backend. Diagram 2022-03-10 12-13-16.png

xdoo commented 2 years ago

@martind260 How do we get the reference (RefID) back into the process? Do we need a reference? Or how would you make the file reachable from your process context?

boal commented 2 years ago

Within the digiwf process context a file can be referenced e.g. by folder name (aka refID) and filename. The file is reachable with the information given in the last call "complete(folder, filename)".

boal commented 2 years ago

The picture below shows the current happy path process for saving a file.

Diagram 2022-03-10 13-07-14

Disadvantages:

Possible solution:

ghost commented 2 years ago

@martind260 How do we get the reference (RefID) back into the process? Do we need a reference? Or how would you make the file reachable from your process context? @xdoo There is no RefID needed anymore. The process instance can hold filename + folder (=pathToFile)

xdoo commented 2 years ago

@martind260 @boal thanks for the explanation

There is only a flat folder hierarchy possible. One folder contains N files but no folder.

I think this is not a problem at all. We would have this in a "via backend" solution as well.

One further question: How would you handle security? If we would have a solution, where the frontend control can directly access an S3 Service and you imagine, that we have a lot of different S3 services - how can we ensure that the requester gets the correct service?

ghost commented 2 years ago

@martind260 @boal thanks for the explanation

There is only a flat folder hierarchy possible. One folder contains N files but no folder.

I think this is not a problem at all. We would have this in a "via backend" solution as well.

One further question: How would you handle security? If we would have a solution, where the frontend control can directly access an S3 Service and you imagine, that we have a lot of different S3 services - how can we ensure that the requester gets the correct service?

To authorize a user, you could prefix the S3-Service with the TaskService. There, the UserToken can be checked against an additionally transmitted taskId, before forwarding to the S3-Service.

xdoo commented 2 years ago

@martind260 sounds good for me.

We need a solution :) @dominikhorn93 What's your opinion?

dominikhorn93 commented 2 years ago

I would really recommend that no individual file references are kept in the process, but only the pointer to a logical grouping, which can be a form field, for example. This can be a 1:1 or a 1:n relationship. If we don't do this, we will get a lot of trouble when synchronizing files.

dominikhorn93 commented 2 years ago

@boal is possible to identify files by a logical folder id? Then we could just save the logical id in the form field.

dominikhorn93 commented 2 years ago

My concerns when we save file references individually in a process and have no logical pointer.

image.png

image.png

image.png

martind260 commented 2 years ago

This also seems possible, but for displaying purposes there has to be created the possiblity to get a list of all files in a folder.

dominikhorn93 commented 2 years ago

@martind260 Then we save the "folder" in the process instance variables and not the single file right?

boal commented 2 years ago

@boal is possible to identify files by a logical folder id? Then we could just save the logical id in the form field.

yes its possible. But the feature to expose all files within a certain folder via a rest endpoint has to be implemented. The logic to get all files from a certain S3-"folder" (aka refId) is already implemented und used for other purposes.

As a result of the above described feature, the process can save the refId (aka folder within integration service) and get all filePathes within a folder if needed.

I would still suggest to change the rest endpoint parameters from "refId" (Example: "FOLDER") and "filename" ("thefile.txt") to pathToFile (Example: "FOLDER/thefile.txt"). With "pathToFile" even "FOLDER/SUBFOLDER/thefile.txt" are possible.

Consequently for this suggestion, the folders with the corresponding endofLife are no longer to be stored in the database, but the single files with endofLife are to be stored there. However, I think that this is not a problem, because a database can handle a lot of entries.

Additionally, with "pathToFile" the s3 integration service can be used more generically without a forced flat folderstructure.

boal commented 2 years ago

Issue for replacing refId and filename by pathToFile. This issue contains also the new restendpoint which is returning all pathToFiles for a given folder.

https://github.com/it-at-m/digiwf-s3-integration/issues/35