feature suggest: harmonization of three pattern by date2name

nbehrnd commented 2 years ago

The option --smart-prepend aims to keep time stamps (added by date2name) in front of the file name. However, for stamps assigned with either --compact, --month, or --short, the pattern generated differs; here, the text is leading ahead of the time stamp.

For example, in a live session of Xubuntu 20.04.2 LTS/Fossa and a pristine checkout of appendfilename/master, these are the observations:

compact time stamp:

xubuntu@xubuntu:~/Desktop/appendfilename/appendfilename$ python3 __init__.py 20211231_test.txt --smart-prepend --verbose -t book
DEBUG    2022-01-09 13:01:24,885 text found: [book]
DEBUG    2022-01-09 13:01:24,885 extracting list of files ...
DEBUG    2022-01-09 13:01:24,885 len(args) [1]
DEBUG    2022-01-09 13:01:24,885 1 filenames found: [20211231_test.txt]
DEBUG    2022-01-09 13:01:24,885 iterate over files ...
DEBUG    2022-01-09 13:01:24,885 options.smartprepend is set with ||book| |20211231_test|.txt
DEBUG    2022-01-09 13:01:24,885 options.smartprepend is set with |<class 'str'>|<class 'str'>|<class 'str'>|<class 'str'>|<class 'str'>
DEBUG    2022-01-09 13:01:24,885 can't find a date/time-stamp, doing a simple prepend
DEBUG    2022-01-09 13:01:24,885  renaming "20211231_test.txt"
DEBUG    2022-01-09 13:01:24,886       ⤷   "book 20211231_test.txt"
DEBUG    2022-01-09 13:01:24,886 successfully finished.
Please press <Enter> for finishing...

month pattern:

xubuntu@xubuntu:~/Desktop/appendfilename/appendfilename$ python3 __init__.py 2021-12_test.txt --smart-prepend --verbose -t book
DEBUG    2022-01-09 13:04:26,884 text found: [book]
DEBUG    2022-01-09 13:04:26,884 extracting list of files ...
DEBUG    2022-01-09 13:04:26,884 len(args) [1]
DEBUG    2022-01-09 13:04:26,884 1 filenames found: [2021-12_test.txt]
DEBUG    2022-01-09 13:04:26,884 iterate over files ...
DEBUG    2022-01-09 13:04:26,884 options.smartprepend is set with ||book| |2021-12_test|.txt
DEBUG    2022-01-09 13:04:26,884 options.smartprepend is set with |<class 'str'>|<class 'str'>|<class 'str'>|<class 'str'>|<class 'str'>
DEBUG    2022-01-09 13:04:26,884 can't find a date/time-stamp, doing a simple prepend
DEBUG    2022-01-09 13:04:26,884  renaming "2021-12_test.txt"
DEBUG    2022-01-09 13:04:26,885       ⤷   "book 2021-12_test.txt"
DEBUG    2022-01-09 13:04:26,885 successfully finished.
Please press <Enter> for finishing...

short pattern:

xubuntu@xubuntu:~/Desktop/appendfilename/appendfilename$ python3 __init__.py 211231_test.txt --smart-prepend --verbose -t book
DEBUG    2022-01-09 13:05:26,694 text found: [book]
DEBUG    2022-01-09 13:05:26,694 extracting list of files ...
DEBUG    2022-01-09 13:05:26,695 len(args) [1]
DEBUG    2022-01-09 13:05:26,695 1 filenames found: [211231_test.txt]
DEBUG    2022-01-09 13:05:26,695 iterate over files ...
DEBUG    2022-01-09 13:05:26,695 options.smartprepend is set with ||book| |211231_test|.txt
DEBUG    2022-01-09 13:05:26,695 options.smartprepend is set with |<class 'str'>|<class 'str'>|<class 'str'>|<class 'str'>|<class 'str'>
DEBUG    2022-01-09 13:05:26,695 can't find a date/time-stamp, doing a simple prepend
DEBUG    2022-01-09 13:05:26,695  renaming "211231_test.txt"
DEBUG    2022-01-09 13:05:26,695       ⤷   "book 211231_test.txt"
DEBUG    2022-01-09 13:05:26,695 successfully finished.
Please press <Enter> for finishing...

These observations are coherent with the automatic testing with the test script for pytest for Python 3 just extended, e.g., by

pytest-3  test_appendfilename.py -m "smart" -v

novoid commented 2 years ago

Yes, this looks like a bug.

I'm not using compact/short/month patterns myself and therefore I was probably just sloppy when implementing it for the ISO pattern only. Which is definitely not smart. ;-)

Thanks for reporting.

novoid commented 2 years ago

@nbehrnd

I can confirm that the smart-prepend feature was only implemented for YYYY-MM-DD(THH.MM.SS) format and no other.

The fix might take longer than anticipated. I wrote appendfilename and date2name while I was not aware of named groups for Python regex. Before I fiddle with the old/complex regex, I want to introduce much more flexible regex that are much easier to maintain for the future.

Furthermore, I'm thinking of extracting some functionality from both tools into one library to deal with date- and time-stamps within strings in general.

As a brief sneak preview, here is a snippet from my current brainstorming:

Library
- [ ] Name:
- input is a string
- analyze_timestamp_match( re.match( mystring ) ) -> returns Object:
- [ ] alternatively: no object but list/hashtabel and all of those functions below do take that as parameter1
- match(bool), century, year, month, day, hours, minutes, seconds, dateformat, timeformat, separator_list
- functions:
  - has_century() ...
  - get_separator_list()
  - get_format()
  - [ ] what format? a string?
    - or ignore format and compile any format on-the-fly? But what about "update the previous format with that new time-stamp"?
- generate_timestamp(century, year, month, day, hours, minutes, seconds, dateformat, timeformat, separator_list) -> returns string in suitable format

YMD_SEPARATORS = '[-_.]'         # potential separator character between the entities of year, month, day
DATETIME_SEPARATORS = '[T: -_]'  # potential separator character between the entities of datestamp and timestamp
HMS_SEPARATORS = '[:.-]'         # potential separator character between the entities of hour, minute, second
END_SEPARATORS = '[^a-zA-Z0-9]'  # potential separator character between the entities of datetimestamp and rest

TIMESTAMP_REGEX = re.compile('^' +
    '(?P<overall_datetimestamp>' +                      # BEGIN: overall_datetimestamp: datetimestamp with separator
    '(?P<century>\d{2})?' +                             #   optional century: YY    e.g. 20 (from 2022)
    '(?P<year>\d{2})' +                                 #   YY    e.g. 22 (from 2022)
    '(?P<ym_sep>' + YMD_SEPARATORS + ')?' +             #   optional separator character
    '(?P<month>[01]\d)' +                               #   MM    e.g. 12 (December)
    '(?P<md_sep>' + YMD_SEPARATORS + ')?' +             #   optional separator character
    '(?P<day>[0123]\d)' +                               #   DD    e.g. 31
    '(' +                                               #   BEGIN: timestamp is optional
    '(?P<datetime_sep>' + DATETIME_SEPARATORS + ')?' +  #     optional separator character
    '(?P<hour>[012]\d)' +                               #     HH    e.g. 23
    '(?P<hm_sep>' + HMS_SEPARATORS + ')?' +             #     optional separator character
    '(?P<minute>[012345]\d)' +                          #     MM    e.g. 59
    '(' +                                               #     BEGIN: seconds are optional
    '(?P<ms_sep>' + HMS_SEPARATORS + ')?' +             #       optional separator character
    '(?P<second>[012345]\d)' +                          #       SS    e.g. 59
    ')?' +                                              #     END: seconds are optional
    ')?' +                                              #   END: timestamp is optional
    '(?P<end_sep>' + END_SEPARATORS + ')' +             #   mandatory separator character
    ')(?P<rest>.*)'                                     # END: overall_datetimestamp: datetimestamp with separator
    )

# same regex but in one piece:
TIMESTAMP_REGEX = re.compile('^(?P<overall_datetimestamp>(?P<century>\d{2})?(?P<year>\d{2})(?P<ym_sep>[-_.])?(?P<month>[01]\d)(?P<md_sep>[-_.])?(?P<day>[0123]\d)((?P<datetime_sep>[T: -_])?(?P<hour>[012]\d)(?P<hm_sep>[:.-])?(?P<minute>[012345]\d)((?P<ms_sep>[:.-])?(?P<second>[012345]\d))?)?(?P<end_sep>[^a-zA-Z0-9]))(?P<rest>.*)')

# examples:

# re.match(TIMESTAMP_REGEX, '2022-01-14T17.53.16_foo.bar').groups()
#  -> ('2022-01-14T17.53.16_', '20', '22', '-', '01', '-', '14', 'T17.53.16', 'T', '17', '.', '53', '.16', '.', '16', '_', 'foo.bar')

# re.match(TIMESTAMP_REGEX, '2022-01-14T17.53.16_foo.bar').groupdict()
#  -> {'hm_sep': '.', 'hour': '17', 'overall_datetimestamp': '2022-01-14T17.53.16_', 'century': '20', 'year': '22', 'day': '14', 'rest': 'foo.bar', 'month': '01', 'end_sep': '_', 'second': '16', 'md_sep': '-', 'ms_sep': '.', 'datetime_sep': 'T', 'ym_sep': '-', 'minute': '53'}

# re.match(TIMESTAMP_REGEX, '2022-01-14T17.53.16_foo.bar').group('hour')
#  -> '17'

# re.match(TIMESTAMP_REGEX, '2022-01-14T17.53.16_foo.bar').group('second')
#  -> '16'

# re.match(TIMESTAMP_REGEX, '2022-01-14T17.53.16_foo.bar').group('datetime_sep')
#  -> 'T'

# The regular expression matches date- and time-stamps as long as order is YMDHM(S).
# For other orders like MM/DD/YYYY: please do re-think your life choices. ;-)  *SCNR*
# Unsupported:
#    - non-ISO-like orders of the entities
#    - time zones or time offsets
#    - weeks
#    - durations or intervals
#    - milliseconds
# Simplified:   (YY)?YY.MM.DD.HH.MM(.SS)?
# The separation characters are limited to sets of potential characters (see regex for details).

Link for online testing the regex%3F)%3F(%3FP%3Cend_sep%3E%5B%5Ea-zA-Z0-9%5D))(%3FP%3Crest%3E.*)&test_string=2022-01-14T17.53.16_foo.bar&ignorecase=0&multiline=0&dotall=0&verbose=1)

nbehrnd commented 2 years ago

Compared to this rich manifold, the filter «is it at all one of the patterns issued by date2name» in my testing script

if (re.search("^\d{4}-[012]\d-[0-3]\d_", old_filename) or
    re.search('^\d{4}-[012]\d-[0-3]\dT[012]\d\.[0-5]\d\.[0-5]\d_', old_filename) or
    re.search("^\d{4}[012]\d[0-3]\d_", old_filename) or
    re.search("^\d{4}-[012]\d_", old_filename) or
    re.search("^\d{2}[012]\d[0-3]\d_", old_filename)):

    # enter the inner loop

appears naïve, because it does not consider a set this large of separators between the decimals. Rather, I speculate your variations anticipate/are a requirement to render appendfilename (maybe already date2name) functional in (Linux .and. MacOS .and. Windows) for time stamps set by date2name, as well as by other time stamp programs.

(And because my current focus is on what GLT18 showcased after presenting filetags, date2name's pattern compact/month/short are currently not used by mine.)

novoid / appendfilename

feature suggest: harmonization of three pattern by date2name #15