novoid / appendfilename

Intelligent appending text to file names, considering file extensions and file tags
GNU General Public License v3.0
50 stars 7 forks source link

feature suggest: harmonization of three pattern by date2name #15

Open nbehrnd opened 2 years ago

nbehrnd commented 2 years ago

The option --smart-prepend aims to keep time stamps (added by date2name) in front of the file name. However, for stamps assigned with either --compact, --month, or --short, the pattern generated differs; here, the text is leading ahead of the time stamp.

For example, in a live session of Xubuntu 20.04.2 LTS/Fossa and a pristine checkout of appendfilename/master, these are the observations:

novoid commented 2 years ago

Yes, this looks like a bug.

I'm not using compact/short/month patterns myself and therefore I was probably just sloppy when implementing it for the ISO pattern only. Which is definitely not smart. ;-)

Thanks for reporting.

novoid commented 2 years ago

@nbehrnd

I can confirm that the smart-prepend feature was only implemented for YYYY-MM-DD(THH.MM.SS) format and no other.

The fix might take longer than anticipated. I wrote appendfilename and date2name while I was not aware of named groups for Python regex. Before I fiddle with the old/complex regex, I want to introduce much more flexible regex that are much easier to maintain for the future.

Furthermore, I'm thinking of extracting some functionality from both tools into one library to deal with date- and time-stamps within strings in general.

As a brief sneak preview, here is a snippet from my current brainstorming:


YMD_SEPARATORS = '[-_.]'         # potential separator character between the entities of year, month, day
DATETIME_SEPARATORS = '[T: -_]'  # potential separator character between the entities of datestamp and timestamp
HMS_SEPARATORS = '[:.-]'         # potential separator character between the entities of hour, minute, second
END_SEPARATORS = '[^a-zA-Z0-9]'  # potential separator character between the entities of datetimestamp and rest

TIMESTAMP_REGEX = re.compile('^' +
    '(?P<overall_datetimestamp>' +                      # BEGIN: overall_datetimestamp: datetimestamp with separator
    '(?P<century>\d{2})?' +                             #   optional century: YY    e.g. 20 (from 2022)
    '(?P<year>\d{2})' +                                 #   YY    e.g. 22 (from 2022)
    '(?P<ym_sep>' + YMD_SEPARATORS + ')?' +             #   optional separator character
    '(?P<month>[01]\d)' +                               #   MM    e.g. 12 (December)
    '(?P<md_sep>' + YMD_SEPARATORS + ')?' +             #   optional separator character
    '(?P<day>[0123]\d)' +                               #   DD    e.g. 31
    '(' +                                               #   BEGIN: timestamp is optional
    '(?P<datetime_sep>' + DATETIME_SEPARATORS + ')?' +  #     optional separator character
    '(?P<hour>[012]\d)' +                               #     HH    e.g. 23
    '(?P<hm_sep>' + HMS_SEPARATORS + ')?' +             #     optional separator character
    '(?P<minute>[012345]\d)' +                          #     MM    e.g. 59
    '(' +                                               #     BEGIN: seconds are optional
    '(?P<ms_sep>' + HMS_SEPARATORS + ')?' +             #       optional separator character
    '(?P<second>[012345]\d)' +                          #       SS    e.g. 59
    ')?' +                                              #     END: seconds are optional
    ')?' +                                              #   END: timestamp is optional
    '(?P<end_sep>' + END_SEPARATORS + ')' +             #   mandatory separator character
    ')(?P<rest>.*)'                                     # END: overall_datetimestamp: datetimestamp with separator
    )

# same regex but in one piece:
TIMESTAMP_REGEX = re.compile('^(?P<overall_datetimestamp>(?P<century>\d{2})?(?P<year>\d{2})(?P<ym_sep>[-_.])?(?P<month>[01]\d)(?P<md_sep>[-_.])?(?P<day>[0123]\d)((?P<datetime_sep>[T: -_])?(?P<hour>[012]\d)(?P<hm_sep>[:.-])?(?P<minute>[012345]\d)((?P<ms_sep>[:.-])?(?P<second>[012345]\d))?)?(?P<end_sep>[^a-zA-Z0-9]))(?P<rest>.*)')

# examples:

# re.match(TIMESTAMP_REGEX, '2022-01-14T17.53.16_foo.bar').groups()
#  -> ('2022-01-14T17.53.16_', '20', '22', '-', '01', '-', '14', 'T17.53.16', 'T', '17', '.', '53', '.16', '.', '16', '_', 'foo.bar')

# re.match(TIMESTAMP_REGEX, '2022-01-14T17.53.16_foo.bar').groupdict()
#  -> {'hm_sep': '.', 'hour': '17', 'overall_datetimestamp': '2022-01-14T17.53.16_', 'century': '20', 'year': '22', 'day': '14', 'rest': 'foo.bar', 'month': '01', 'end_sep': '_', 'second': '16', 'md_sep': '-', 'ms_sep': '.', 'datetime_sep': 'T', 'ym_sep': '-', 'minute': '53'}

# re.match(TIMESTAMP_REGEX, '2022-01-14T17.53.16_foo.bar').group('hour')
#  -> '17'

# re.match(TIMESTAMP_REGEX, '2022-01-14T17.53.16_foo.bar').group('second')
#  -> '16'

# re.match(TIMESTAMP_REGEX, '2022-01-14T17.53.16_foo.bar').group('datetime_sep')
#  -> 'T'

# The regular expression matches date- and time-stamps as long as order is YMDHM(S).
# For other orders like MM/DD/YYYY: please do re-think your life choices. ;-)  *SCNR*
# Unsupported:
#    - non-ISO-like orders of the entities
#    - time zones or time offsets
#    - weeks
#    - durations or intervals
#    - milliseconds
# Simplified:   (YY)?YY.MM.DD.HH.MM(.SS)?
# The separation characters are limited to sets of potential characters (see regex for details).

Link for online testing the regex%3F)%3F(%3FP%3Cend_sep%3E%5B%5Ea-zA-Z0-9%5D))(%3FP%3Crest%3E.*)&test_string=2022-01-14T17.53.16_foo.bar&ignorecase=0&multiline=0&dotall=0&verbose=1)

nbehrnd commented 2 years ago

Compared to this rich manifold, the filter «is it at all one of the patterns issued by date2name» in my testing script

if (re.search("^\d{4}-[012]\d-[0-3]\d_", old_filename) or
    re.search('^\d{4}-[012]\d-[0-3]\dT[012]\d\.[0-5]\d\.[0-5]\d_', old_filename) or
    re.search("^\d{4}[012]\d[0-3]\d_", old_filename) or
    re.search("^\d{4}-[012]\d_", old_filename) or
    re.search("^\d{2}[012]\d[0-3]\d_", old_filename)):

    # enter the inner loop

appears naïve, because it does not consider a set this large of separators between the decimals. Rather, I speculate your variations anticipate/are a requirement to render appendfilename (maybe already date2name) functional in (Linux .and. MacOS .and. Windows) for time stamps set by date2name, as well as by other time stamp programs.

(And because my current focus is on what GLT18 showcased after presenting filetags, date2name's pattern compact/month/short are currently not used by mine.)