kenahoo / Path-Class

Cross-platform path specification manipulation
http://search.cpan.org/dist/Path-Class/
15 stars 28 forks source link

Methods to get filename without extension, extension without filename etc. #36

Closed bpj closed 9 years ago

bpj commented 9 years ago

Ken, we talked about this over email a long while ago.

This is my first pull request ever, so please bear with me if I'm doing it all wrong!

This adds methods for:

We discussed using the linguistic terms 'stem' and 'suffix' on email. I added aliases with 'extension' as well.

suffix returns the extension without a dot if there was a dot and an extension, an empty string if there was a dot but no extension, and undef if there was neither a dot nor an extension. This is of course a side effect of using my($suffix) = $file->basename =~ /\.([^.]*)?\z/; to get the suffix, but I think it may actually be useful, so I didn't 'normalize' the undef to an empty string.

with_suffix was added because it does the right thing regardless of whether you pass the new suffix with or without a leading dot. Also unlike $file->dir->file( $file->stem . ".newsuffix" ) it doesn't prepend a dot for the current directory if $file doesn't have any directory, which IMO is the Right Thing.

basefile was added so that you can do this without worrying whether the new suffix has a leading dot or not:

$some_dir->file( $some_file->basefile->with_suffix($some_suffix) )

I hope you find them all useful! I'm dealing programmatically with markdown/html or code/documentation etc. files having or meant to have the same stem+/-dir on a regular basis, and I don't think I'm the only one.

/bpj

kenahoo commented 9 years ago

I like it. I think there could also be a with_stem() method, keeping the suffix constant while replacing the stem.

bpj commented 9 years ago

Will do later today or tomorrow.

bpj commented 9 years ago

Den 2014-12-07 07:06, Ken Williams skrev:

I like it. I think there could also be a with_stem() method, keeping the suffix constant while replacing the stem.

Real Life canceled whatever I had to do so I got time to do that, and I thankfully discovered some edge cases I hadn't thought of.

Please have a look at my 'stem' branch. Currently I have tests in place for some of these, some of which fail.

  1. What should with_suffix() do if the new suffix is just a dot?
  2. What should with_suffix() do if the old filename ends in a dot and the new suffix is empty?
  3. What should with_suffix() do if the old filename ends in a dot and the new suffix is just a dot?
  4. What should with_stem() do if the new stem ends in a dot and the old filename didn't have a suffix?
  5. What should with_stem() do if the old filename ended in a dot and the new stem doesn't?
  6. What should with_stem() do if the new stem ends in a dot or has a suffix? Should that be an error?
  7. What should stem() return when the filename ends in a dot?
  8. What should with_suffix() and with_stem() respectively do if they are passed an empty argument, and should they treat an undef argument and an empty string argument differently?
  9. Files with two suffixes clearly do validly exist, so should we have all_suffixes() and without_any_suffixes() methods, or should we deem them unusual enough to expect people to do that with s/// ?

If we by "caller" mean the one who uses Path::Class and by "user" mean tho une who uses the caller's module or script, then my first intuition was to treat undef as meaning "the user doesn't want to change anything". I can however imagine a situation where I wouldn't want such semantics, so it is probably better to treat an undef argument as an error. We force the caller to check what the user gives them, but that's better than what might happen if an empty new stem inadvertently slips through: the caller might inadvertently remove a dot file or something like that! It's better to force the caller to check what the user gives them than inadvertently having that happen!

It is less clear what should happen if withsuffix() gets an empty string. My intuition is that in the cases where the suffix (old or new) matches /\A.?[^.]+/ we should normalize to a single dot, but the other cases are not as clear. It is also my intuition that where the filename matches /.[^.]/ there is a suffix, which however may be the empty string, so that both the stem and the suffix should always be returned without a dot (the caller may always call suffix() to find out whether the original file has a dot without a suffix, a dot with suffix or neither), and existing dots should always be removed before joining a stem and a suffix with a dot -- if there *is_ a suffix.

I'm thinking that it might not be such a great idea to allow an empty (string) suffix as the way to get a file without a suffix, but that it is better to treat that as an error and have an explicit without_suffix() method.

Also think that the best approach would be to always match against the regex qr{.([^.]*)\z} (a dot followed by zero or more non-dots at the end of string), maybe stored in a variable, and have suffix(), stem() and with_suffix() match against it:

It is is not entirely clear to me whether a new stem which already looks as if it has a suffix or ends in a dot should be accepted. I have a hard time to think up a scenario where it would unequivocally make sense.

mrdvt92 commented 9 years ago

9.  Files with two suffixes clearly do validly exist, so should we have all_suffixes() and without_any_suffixes() methods, or should we deem them unusual enough to expect people to do that with s/// ?

I think in path-class-ish syntax the suffix API should be an array.

so... $file->with_suffix("tar", "gz") would be the desired syntax

However, we actually might want suffix_pop (as well as suffix_push|shift|unshift) as this will elevate any of the edge cases that you've asked about.

file("file.ext")->suffix_push(()); #no op file("file.ext")->suffix_push(undef); #no op file("file.ext")->suffix_push(""); #add "." = "file.ext." file("file.ext")->suffix_push("gz"); #add "gz" = "file.ext.gz" file("file.ext")->suffix_unshift("txt"); #add "txt" = "file.txt.ext"

I would not assume that a hidden file on unix is the suffix.  so file(".bashrc")->suffix_push("bkp") would do the right thing ".bashrc.bkp"

I guess we would still need a shift_all and a pop_all

my @ext=file(".bashrc")->suffix_shift_all; #@ext=(); my @ext=file("file.tar.gz")->suffix_shift_all; #@ext=("tar", "gz"); my @ext=file("file.tar.gz")->suffix_pop_all; #@ext=("gz", "tar");

Actually, suffix could be an overloaded object like.  my $file=file("file.ext")->suffix->push("gz"); #add "gz" = "file.ext.gz" my @list=file("file.tar.gz")->suffix->list; #returns ("tar", "gz")my $suffix=file("file.tar.gz")->suffix; #isa Path::Class::File::Suffixprint "$suffix"; #prints "gz"my $file=file($0)->suffix->replace("log"); Thanks, Mike

mrdvt92

bpj commented 9 years ago

I have decided how to proceed: when getting/removing a suffix it must match /(?<=\S)\.([^.]*)\z/ and $1 is returned. When setting a suffix the string provided must match /\A\.?([^.]+)\z/ and ".$1" will be appended to the filename after removing any existing suffix.

I'm closing this and will make a new pull request shortly.