Closed jmaronas closed 3 years ago
Does that need tags in stan file and flags when calling executable?
Is there minimal example somewhere?
I think it gets created in a temp folder that is destroyed afrer sampling is done.
I have been trying to look for the file in the directory created in the /tmp/
dir (not sure if there is any other temporal directory created) and havent find it. However, a simple modification of the code actually generates it.
To generate this file in a normal cmdstan call I just execute:
./BNN id=1 random seed=57150 data file=./data.json output file=./BNN.csv method=sample num_samples=1000 algorithm=hmc adapt engaged=1
So I have just went to the cmdstanpy file named cmdstan_args.py
and add the following to the cmd list that the def compose_comand
method creates and returns:
cmd.append('output')
cmd.append('file={}'.format(csv_file))
cmd.append('profile_file={}'.format(csv_file.split('.csv')[0]+'profile.csv')) # I just add this one
In this way, the profile file is generated in the same path as the csv file.
no one is working on this and it needs to be done! if you'd like to submit a PR, happy to help.
the util function cmdstan_version_at
will let you check that the user has a CmdStan version that can handle this request to generalize the code (and unit tests!).
where is the information on how to do PR on cmdstanpy? Might be able to handle it but not sure yet (quite busy at the moment with some stuff). Need to see first all the things that need to be done for a proper code modification
I think with MaybeDict
block (I don't remember the name) creates a tmp dir that gets destroyed (csv files are moved to a new location before this)?
I might remember this wrong.
https://github.com/stan-dev/cmdstanpy/wiki has details.
understand if you don't have time to deal with the PR process.
suggestions for clear and concise argument name to sample
method?
related question is - should we add methods to CmdStanMCMC
object that return the contents of stdout, stderr, profile, and diagnose files?
okay, will look at it thanks.
In any case I think that for the sample
method it could be handled as the save_diagnostics
one. Something like: save_profiler
And yes, I think we should add a method to CmdStanMCMC object for returning that (not sure if it makes sense for point estimate or variational inference methods as well). Are you gonna open the PR then? Might be able to provide the code I have done if needed
yes, I'll do a PR.
related question is - should we add methods to CmdStanMCMC object that return the contents of stdout, stderr, profile, and diagnose files?
checked code and properties: stdout_files
, stderr_files
, and diagnostic_files
already exist; they return the paths to these files, which is good enough. will implement property profiler_files
which will do the same for the profiler files.
profiling info is printed for all methods, therefore adding this arg to all methods.
I think
with MaybeDict
block (I don't remember the name) creates a tmp dir that gets destroyed (csv files are moved to a new location before this)?I might remember this wrong.
Not sure about this. Anyway since cmdstan doesn't generate the profile.csv file until the end of the execution, I couldnt check if the file is being generated by the cmdstanpy interface, and then erased as you mention. I know that cmdstan generates the file automatically if path is not specified, so perhaps we just need to move the file before erasing the temporal directory.
Ok, I tested and found the error.
By default profile.csv
is saved to current working directory. And we sample 4 chains, each of them will save results to profile.csv
(so file is overwritten).
We should add option to save with the filename. But really CmdStan should add timestamp + id on the filename.
@bbbales2
profile_location = Path.cwd() / "profile.csv"
print(profile_location.exists(), "\n", profile_location)
I think the behavior is similar to output.csv in default cmdstan. If you don't specify a filename, then things are written to output.csv.
I guess the difference then is that a profile output is not written every time (if the model has no profile statements, for instance)
Maybe the solution is to just always provide profile filenames and then check after the run if the filename have been written?
Sounds like a good solution.
just to let y'all know I've got a branch and got things working. need unit tests and then code review.
I am reopening this discussion. I have just launched a model and when several chains are run for MCMC sampling, the validate_csv_files
method is raising error due to inconsistencies found in the files *-profile.csv
. So if the sampling method is run with argument validate_csv = False
, then this error dissapears.
I have just reinstalled cmdstanpy but still appears, more precisely in version 0.9.68
Ping @mitzimorris @ahartikainen in case this issue needs to be re-opened
the fix has been merged to develop but we haven't done another release. can you use the develop branch for now?
pip install -e git+https://github.com/stan-dev/cmdstanpy@/develop
Sure, just wanted to let you know in case it was not checked.
Thank you
Summary:
The
cmdstanpy.CmdStanModel.sample()
does not provide the profiling.csv file that cmdstan generatesDescription:
When running cmdstan through the cmdstanpy API, I cannot see the profile.csv file being generated (the one that includes execution times). Is there anyone working on this or any plan to incorporate it?
I have been modifying the source code from the cmdstanpy on my local machine so that it is printed, but don't wanna start any pull request if anyone is working on this already.