Closed dla-kramski closed 7 months ago
This is a bigger change but it makes sense and this will be implemented. Added as a milestone for v1.0.0
In regards of the filesystem Ross adds +1 in #43 (just don't want to lose it due to closing of that issue)
@dla-kramski I am not sure about the extension of directory. As there is no definition of it and it really is just part of the name, how would you work with it? How would you define the boundaries, e.g. would "." separate the dirname and the extension? How should we handle if a dirname has more than one (arbitrarily) chosen separator? I understand that there are extensions like "SYSTEM" and such but these are more semantics for the user and do not serve a technical purpose like MIME connotations.
So, I'd like to not add dirname extensions.
Added, resp. aligned with Wikidata
ToDo / To be discussed
Go's filepath.Ext() is not working as expected:
filename: .hiddenfile
filepath: ../../testdata/.hiddenfile
filenameextension: .hiddenfile
The doc says: Ext returns the file name extension used by path. The extension is the suffix beginning at the final dot in the final element of path; it is empty if there is no dot.: https://pkg.go.dev/path/filepath#Ext
As the example shows, this is not a perfect approach as every hidden file on Linux/Unix without a dot extension will have it's own name as extension. There must also be the condition that the dot is not the first element in the string.
Go's filepath.Ext() is not working as expected:
filename: .hiddenfile filepath: ../../testdata/.hiddenfile filenameextension: .hiddenfile
The doc says: Ext returns the file name extension used by path. The extension is the suffix beginning at the final dot in the final element of path; it is empty if there is no dot.: https://pkg.go.dev/path/filepath#Ext
As the example shows, this is not a perfect approach as every hidden file on Linux/Unix without a dot extension will have it's own name as extension. There must also be the condition that the dot is not the first element in the string.
As Golang will not change filepath.Ext(), I added a workaround.
Regarding filesystem:
Significant changes would have to be made to automatically determine the file system. In addition, the execution might have to be carried out as root/SYSTEM, which is not desirable. I am considering introducing a flag that allows users to add the file system manually.
@dla-kramski I am not sure about the extension of directory. As there is no definition of it and it really is just part of the name, how would you work with it? How would you define the boundaries, e.g. would "." separate the dirname and the extension? How should we handle if a dirname has more than one (arbitrarily) chosen separator? I understand that there are extensions like "SYSTEM" and such but these are more semantics for the user and do not serve a technical purpose like MIME connotations.
So, I'd like to not add dirname extensions.
On second thought, I'm inclined to agree.
Wikidata is playing an increasingly important role in digital preservation.
FileTrove should align its data model with this de facto standard (see https://www.wikidata.org/wiki/Q37787110 and related pages):
This could be implemented with the following changes:
Session table
filesystem
pathseparator
Files and directories tables
filepath
|dirpath
mountpoint
. This is identical to the existing columnsfilename
|dirname
filename
|dirname
filename
|dirname
.filenameextension
|dirnameextension
This information could also be obtained by parsing the existing "filename" column afterwards, but ftrove has it all to hand on run time and can easily record it. The filename without the path may be particularly useful for tracking files with the same name across several sessions.