stan-dev / cmdstan

CmdStan, the command line interface to Stan
https://mc-stan.org/users/interfaces/cmdstan
BSD 3-Clause "New" or "Revised" License
212 stars 93 forks source link

More output file name quirks #1226

Closed andrewjradcliffe closed 10 months ago

andrewjradcliffe commented 11 months ago

Summary:

As noted in this issue, the treatment of paths as strings in CmdStan can lead to strange behavior.

Description:

Treating a string as a path, and subsequently splitting on last dot produces a family of problems.

Reproducible Steps:

I am certain that more examples can be produced, but here are a few interesting cases.

Assuming that you have the bernoulli example built and your current directory is examples/bernoulli/bernoulli:

Case 1
rm -rf foo* && mkdir -p foo.bar

# produces a file "baz" in directory "foo.bar"
./bernoulli data file=bernoulli.data.json sample num_chains=1 output file=foo.bar/baz
ls -a foo.bar

# produces no files
./bernoulli data file=bernoulli.data.json sample num_chains=3 output file=foo.bar/baz
ls -a foo.bar
Case 2
rm -rf foo* && mkdir -p foo/.bar

# produces a file "baz" in directory "foo/.bar"
./bernoulli data file=bernoulli.data.json sample num_chains=1 output file=foo/.bar/baz
ls -a foo/.bar

# produces no files
./bernoulli data file=bernoulli.data.json sample num_chains=3 output file=foo/.bar/baz
ls -a foo/.bar
Debug of Case 2

CmdStan is splitting on the '.' and producing output file names of the form foo/_${id}.bar/baz and since _${id}.bar is a non-existent directory, no files are created. If we create said directories ahead of time, the writer succeeds.

rm -rf foo* && mkdir -p foo/_1.bar foo/_2.bar

# Creates "foo/_1.bar/baz" and "foo/_2.bar/baz" but not "foo/_3.bar/baz"
./bernoulli data file=bernoulli.data.json sample num_chains=3 output file=foo/.bar/baz
ls -a foo/.bar
Case 3
rm -rf foo* && mkdir -p foo/bar

# produces no file(s); ".." is not a writable file
./bernoulli data file=bernoulli.data.json sample num_chains=1 output file='foo/bar/..'
ls -a foo
ls -a foo/bar

# produces files "._1.", "._2.", "._3." in directory "foo/bar"
./bernoulli data file=bernoulli.data.json sample num_chains=3 output file='foo/bar/..'
ls -a foo/bar

Current Output:

Summarized from reproducer above:

Case file (input) num_chains == 1 num_chains > 1
1 foo.bar/baz foo.bar/baz nothing; not possible to recover with pre-existing directory
2 foo/.bar/baz foo/.bar/baz foo/_${id}.bar/baz iff pre-existing foo/_${id}.bar directory
3 'foo/bar/..' nothing foo/bar/._${id}.

Expected Output:

Case file (input) num_chains == 1 num_chains > 1
1 foo.bar/baz foo.bar/baz.csv foo.bar/baz_${id}.csv
2 foo/.bar/baz foo/.bar/baz.csv foo/.bar/baz_${id}.csv
3 'foo/bar/..' error error

One could argue that the errors should be handled by substituting the default file name after path normalization, but it is ill-formed as a directory was provided when a file was required.

Additional Information:

Current Version:

v2.33.1

mitzimorris commented 10 months ago

all of these problems could be solved by using Boost filesystem library, but it's not a header-only library, which will complicated the build process. without Boost filesystem, we can kludge up checks.

andrjohns commented 10 months ago

all of these problems could be solved by using Boost filesystem library, but it's not a header-only library, which will complicated the build process. without Boost filesystem, we can kludge up checks.

An alternative could be the std library filesystem, but that requires C++17 (still get to stay header-only at least!)