gristlabs / grist-core

Grist is the evolution of spreadsheets.
https://www.getgrist.com/
Apache License 2.0
7.16k stars 320 forks source link

[question/request] Load images from disk #99

Open asitemade4u opened 2 years ago

asitemade4u commented 2 years ago

From what I understand of Grist's philosophy, I get that you wish to keep all data in a single SQLite file -- no risk of database discrepancy that way. However, some tools such as PHP Generator, Obsidian, or any web server for that matter, allow to fetch then display an image using its URL or path on disk (relative to the folder of the datafile). I think it would be a valuable idea for Grist as:

  1. it would alleviate the data file
  2. it would allow for seamless image updates upstream (actioned by a separate image management process)
  3. and it would not be too harmful when a link would break as its cell would then display a path or URL, showing the link broken immediately.
asitemade4u commented 2 years ago

Here is how PHP Generator (which is quite thorough, as per usual) handles this request: External Image

paulfitz commented 2 years ago

Thanks @asitemade4u. There's definitely a case for externalizing attachments. Do you have any thoughts of how backups and snapshots should work, should they include external attachments or keep that separate?

asitemade4u commented 2 years ago

Here are my ideas on the subject:

yohanboniface commented 1 year ago

Related need here: we'd like to have an attachment column for hundreds of lines with documents sometimes > 500Mo (paper scans…), so in an ideal world, we'd like to store them in a separate service (like Minio). @paulfitz This is something we could try to work on, do you have some inputs on how this could be implemented ?

paulfitz commented 1 year ago

Grist documents contain a _grist_Attachments table with metadata about individual attachments, but not their contents: https://github.com/gristlabs/grist-core/blob/7dc49f3c850ea6cf7f7832d069088c36a200b93b/app/common/schema.ts#L145-L154 The fileIdent column is a key to a separate _gristsys_Files table, that contains attachment contents: https://github.com/gristlabs/grist-core/blob/94a7b750a8db2421174e671565cbe185e1067dbe/app/server/lib/DocStorage.ts#L73-L77

I'd suggest tweaking the handling of _grist_Attachments so that it can represent attachments that are stored externally. That could be by extending the meaning of fileIdent, or by adding an extra column or columns.

As a practical matter, it would probably be necessary to continue to support in-document attachments, to avoid disruption to existing Grist installations and document backups.

There have been requests for an attachment-like UI that works with link-like attachments, e.g. to videos etc.

There are decisions that would need taking about management of external storage. For example: what happens if attachments are deleted within a document - should that delete the externally stored attachment? Likewise (and related), what does copying a document mean now, should external attachments also be duplicated? Life is easiest if (as @asitemade4u suggested) Grist just doesn't get involved in the lifecycle of external attachments at all, except (perhaps) in their initial creation.

For the UI: I suspect presigned URLs would be the way to go for uploading and viewing.

There's a lot more to say, there are a lot of options and it would not be a small project, but I think it could work out quite nicely.

tba-code commented 9 months ago

How about adding filePath, to _grist_Attachments? a null path could be interpreted as being stored in the database.

An easy way to handle the uploads may be /attachments/\<DOC ID>/\<fileIDENT>.\<ext>

While backwards compatibility is a perfectly good reason to keep existing behavior, I also believe the new behavior should be made the default, with an environment variable or configuration setting available to revert to blobbing if desired. My reasoning is that keeping attachments as files would eliminate a lot of headache when working with attachments in custom widgets and formulas.

If you are worried about files modified elsewhere being loaded in the doc, could compute an MD5 of the upload and store it as fileHash and try and match the metadata to ensure the file is unchanged. In this case, disabling rather than removing modified attachments from the grist UI may be good, with a a way for the user to override this disable for particular files. A boolean like allowExternallyModified could work. @paulfitz