Open glael opened 4 years ago
In my opinion Hydrus should allow all "supported" formats through with no problem but give an explicit warning about how things can go pearshaped with formats likely to be changed by being used on at least the first import of such. If things do go pearshaped afterwards then it's explicitly on the user's head. And probably also a warning for all formats it doesn't recognise or know how to group (see system:filetype and how it groups image formats together).
Noting some 3D formats as people have requested it in the past:
.fbx .obj .daz .stl .3ds .mdl .sdl
There is many more formats for 3D use.... a 3D viewer is prob needed if one or more of these are added.
Maybe MIME
type could be tried to be inferred from file content/import details/etc, and users getting a warning/amend dialog if a MIME
type cannot be asserted?
If the whole range (or at least a large subset) of standard MIME
can be recognized and imported right off the bat, it'd instantly support a wide range, and later on make it easy to add more types.
Maybe
MIME
type could be tried to be inferred from file content/import details/etc, and users getting a warning/amend dialog if aMIME
type cannot be asserted?If the whole range (or at least a large subset) of standard
MIME
can be recognized and imported right off the bat, it'd instantly support a wide range, and later on make it easy to add more types.
That's already the case, Hydrus uses magic numbers to determine file types. The limited support is more of an intended limitation atm and not really a technical one afaik.
This is potentially dangerous though, since the act of opening a file sometimes changes the hash of the file. (for example: when opening a .epub file with calibre, it adds/changes the META-INF/calibre-bookmarks.txt file inside the epub/zip. This would break hydrus' hash-based system.
I'm not familiar with that software in particular, but couldn't a lot of mangling of this sort be prevented by having hydrus set all its media file permissions to read-only?
My understanding of file identification is that it was currently relying on some mix of magic number filetype detection and ffmpeg. https://pypi.org/project/filetype/
For weird filetypes it is usually the responsibility of http
to provide a Content-Type
I think marking file which could not be identified by either of these means just becoming application/octect-stream
is fine, and you can manually adjust on your end. MIMEs basically exist because extensions are unreliable, but you could let the user whitelist the problematic extensions and assign them a mime, so they don't have to go in and manually reassign things.
One of those Discord screenshots implies that filetypes without visual media viewer support can't have thumbnails. I don't see why that would need to be the case. Some applications generate their own thumbnails for their own filetypes. For example, LibreOffice has a Windows Explorer extension that generates thumbnails for documents - could Hydrus just grab those thumbnails for those files?
Audio files could have thumbnails generated based on their waveform. ffmpeg could be used for this.
Hydrus does currently generate thumbnails for a file type it can't view, namely swf.
The issue of hashes changing is interesting. What is the behavior currently if I say edit an image form Hydrus in paint or Photoshop and save it. Does Hydrus recompute the hash or just keep thinking the file has its original hash?
Does Hydrus recompute the hash or just keep thinking the file has its original hash?
The latter at first, but once file maintenance runs and Hydrus checks the hashes, it will think the file is invalid, see screenshot.
That means, editing files that have been added to Hydrus is currently not really a thing. Keeping track of potentially of millions of files changing is also not that trivial afaik. While Hydrus is running, you could watch for filesystem events. But if changes are made to a file while it's closed, Hydrus would need to potentially recheck every file on startup (which is not feasible I think).
Of course, you could do things like calculating hashes for groups of files and then checking those first (and if a hash is wrong, then check each file of the group, basically a chunked approach). I'm not really an expert on this. The way Git does it might also be feasible, but I don't know how well that would perform with this many (and potentially huge) binary files.
I proposed a simpler workaround in Discord that could also work, but surely isn't as ideal or as desirable:
If the backing filesystem supports them, snapshots could be used, but that's a weird complexity. Implementation-wise, Windows's Volume Shadow Copy APIs can be not so fun too. (Directing VSS to only copy a specific directory isn't obvious. ZFS or btrfs should be far easier, though.) https://docs.microsoft.com/en-us/windows/win32/vss/using-the-volume-shadow-copy-service
Open Office XML document support as per #362 , see this comment in that issue for more information.
One of those Discord screenshots implies that filetypes without visual media viewer support can't have thumbnails. I don't see why that would need to be the case. Some applications generate their own thumbnails for their own filetypes. For example, LibreOffice has a Windows Explorer extension that generates thumbnails for documents - could Hydrus just grab those thumbnails for those files?
Audio files could have thumbnails generated based on their waveform. ffmpeg could be used for this.
That's not actually how thumbnailing works in operating systems. A program supplies a small subprogram to the OS for extracting a preview from a document, and either this subprogram is sufficient to perform playback (for example of certain video codecs) and/or produces a bitmap for the poster frame. This requires the installation of OS specific components, and is not portable across OSes, and is not known to other programs. At best there might be a way to extract some non preview, default icons for some files under some known common operating systems.
Also if I'm not mistaken a swf does not possess a concept of a poster frame and can have an arbitrary playback order, so generating a preview image instead of an icon is basically impossible, which is why hydrus just uses an icon embeded in hydrus.
Also if I'm not mistaken a swf does not possess a concept of a poster frame and can have an arbitrary playback order, so generating a preview image instead of an icon is basically impossible, which is why hydrus just uses an icon embeded in hydrus.
SWFs do have thumbnails in Hydrus.
@bbappserver The fact that OSs handle thumbnailing differently doesn't mean you can't interact with them. See https://thumbsviewer.github.io/ & https://github.com/mdegrazia/OSX-QuickLook-Parser. Also, at least on Windows, https://github.com/QL-Win/QuickLook is should be usable just fine as a library with a bit of patching.
Hydrus doesn't have to support a huge number of extremely different operating systems.
I would prefer not to rely on OS-specific implementations for thumbnails, especially not if it means having to use a different library for each of them; imo, that's not worth it (both in terms of initial implementation and having to maintain it in the future) just to gain the ability to have content-based thumbnails for file types most users will likely never put into Hydrus. It would potentially also complicate things like the ability for the user to force re-generate thumbnails.
Supporting different file types UI wise (browsing) should be trivial, however launching them and integrating features into the application is not. Not everything is supported by MPV or browsers. @ShadowJonathan https://github.com/openpreserve/fido and https://github.com/richardlehane/siegfried are the gold standards for file identification in the archive community, they seem to do everything, from magic numbers to "deeper checks". Others include:
Easier would be to add a check box, that would allow the import of binary data of any type. The whole processing, thumbnailing and what not can come at the later stages. Basic enforcement of any type of data is probably more important.
I have a collection of SVG
icons that would feel at home together with their png
brethren in hydrus. I'd assume support would be less work than any 3d, interactive or animated media.
Clip Studio Paint (.clip) files have also been requested. #810
I'm not familiar with that software in particular, but couldn't a lot of mangling of this sort be prevented by having hydrus set all its media file permissions to read-only?
Yes that sounds good
I proposed a simpler workaround in Discord that could also work, but surely isn't as ideal or as desirable
If you need to export the file wouldn't it be easier and safer to make a copy with the changes and remove the older asset/project?
With the recent addition of CBZ support, I'd like to throw in the suggestion of additional compressed filetypes. I would love to spin up a Hydrus database that could archive my Photoshop/Clip/Krita projects, but their file sizes can be pretty massive. I store my finished projects in ZIP, which often cuts the size in half (e.g. 1gb -> 500mb). If we could support something like this, it would open up Hydrus for completely new use-cases.
With the new "Force filetype" option, I can actually force the zip as a PSD, regenerate thumbnail (it will default to the PSD filetype icon), and then overwrite that manually with a thumbnail I made myself. Hydrus won't use the thumbnail file if it's just a ZIP though, and when "forced" as a PSD, "open in external program" will ask Photoshop to open the ZIP file.
I'm sure this is easier said than done. I'm not familiar with how the image project files are handled, but streaming them through a pkzip to an image library may pose a challenge.
A simpler approach that I think may be a good solution to supporting many different file formats:
Just my $0.02
I would love to spin up a Hydrus database that could archive my Photoshop/Clip/Krita projects, but their file sizes can be pretty massive. I store my finished projects in ZIP, which often cuts the size in half (e.g. 1gb -> 500mb).
Krita files are already ZIPs themselves but are uncompressed by default. You can just change a setting in Krita to use compression thought instead of putting it in another zip. Photoshop also has the option for compression of the layers in PSD files which can use either RLE or ZIP compression.
Could JPEG XL support be added using pillow-jxl-plugin?
@Cguy7777 I have tried using that for jpeg-xl but it was too unstable and when it crashed it causes the entire client to hang instead of throwing an exception.
Some screenshots from the discord
Since PSD files are already allowed, there is probably no reason not to allow these: Same, .wav files are not very different from mp3 files: You get the idea:
The actual issue
Since this gets requested a lot, it is probably worth discussing arbitrary file imports (i.e. allowing any file to be imported, without checking if it's "allowed" first): This is potentially dangerous though, since the act of opening a file sometimes changes the hash of the file. (for example: when opening a .epub file with calibre, it adds/changes the META-INF/calibre-bookmarks.txt file inside the epub/zip. This would break hydrus' hash-based system.
So: please discuss: "should hydrus prevent users from importing such files?"