stan-dev / cmdstanpy

CmdStanPy is a lightweight interface to Stan for Python users which provides the necessary objects and functions to compile a Stan program and fit the model to data using CmdStan.
BSD 3-Clause "New" or "Revised" License
151 stars 69 forks source link

CmdStanModel.Sample method does not track profiling #359

Closed jmaronas closed 3 years ago

jmaronas commented 3 years ago

Summary:

The cmdstanpy.CmdStanModel.sample() does not provide the profiling.csv file that cmdstan generates

Description:

When running cmdstan through the cmdstanpy API, I cannot see the profile.csv file being generated (the one that includes execution times). Is there anyone working on this or any plan to incorporate it?

I have been modifying the source code from the cmdstanpy on my local machine so that it is printed, but don't wanna start any pull request if anyone is working on this already.

ahartikainen commented 3 years ago

Does that need tags in stan file and flags when calling executable?

Is there minimal example somewhere?

I think it gets created in a temp folder that is destroyed afrer sampling is done.

jmaronas commented 3 years ago

I have been trying to look for the file in the directory created in the /tmp/ dir (not sure if there is any other temporal directory created) and havent find it. However, a simple modification of the code actually generates it.

To generate this file in a normal cmdstan call I just execute:

./BNN id=1 random seed=57150 data file=./data.json output file=./BNN.csv method=sample num_samples=1000 algorithm=hmc adapt engaged=1

So I have just went to the cmdstanpy file named cmdstan_args.py and add the following to the cmd list that the def compose_comand method creates and returns:

        cmd.append('output')
        cmd.append('file={}'.format(csv_file))                                                        
        cmd.append('profile_file={}'.format(csv_file.split('.csv')[0]+'profile.csv'))   # I just add this one              

In this way, the profile file is generated in the same path as the csv file.

mitzimorris commented 3 years ago

no one is working on this and it needs to be done! if you'd like to submit a PR, happy to help. the util function cmdstan_version_at will let you check that the user has a CmdStan version that can handle this request to generalize the code (and unit tests!).

jmaronas commented 3 years ago

where is the information on how to do PR on cmdstanpy? Might be able to handle it but not sure yet (quite busy at the moment with some stuff). Need to see first all the things that need to be done for a proper code modification

ahartikainen commented 3 years ago

I think with MaybeDict block (I don't remember the name) creates a tmp dir that gets destroyed (csv files are moved to a new location before this)?

I might remember this wrong.

mitzimorris commented 3 years ago

https://github.com/stan-dev/cmdstanpy/wiki has details.

understand if you don't have time to deal with the PR process. suggestions for clear and concise argument name to sample method?

mitzimorris commented 3 years ago

related question is - should we add methods to CmdStanMCMC object that return the contents of stdout, stderr, profile, and diagnose files?

jmaronas commented 3 years ago

okay, will look at it thanks.

In any case I think that for the sample method it could be handled as the save_diagnostics one. Something like: save_profiler

And yes, I think we should add a method to CmdStanMCMC object for returning that (not sure if it makes sense for point estimate or variational inference methods as well). Are you gonna open the PR then? Might be able to provide the code I have done if needed

mitzimorris commented 3 years ago

yes, I'll do a PR.

mitzimorris commented 3 years ago

related question is - should we add methods to CmdStanMCMC object that return the contents of stdout, stderr, profile, and diagnose files?

checked code and properties: stdout_files, stderr_files, and diagnostic_files already exist; they return the paths to these files, which is good enough. will implement property profiler_files which will do the same for the profiler files.

mitzimorris commented 3 years ago

profiling info is printed for all methods, therefore adding this arg to all methods.

jmaronas commented 3 years ago

I think with MaybeDict block (I don't remember the name) creates a tmp dir that gets destroyed (csv files are moved to a new location before this)?

I might remember this wrong.

Not sure about this. Anyway since cmdstan doesn't generate the profile.csv file until the end of the execution, I couldnt check if the file is being generated by the cmdstanpy interface, and then erased as you mention. I know that cmdstan generates the file automatically if path is not specified, so perhaps we just need to move the file before erasing the temporal directory.

ahartikainen commented 3 years ago

Ok, I tested and found the error.

By default profile.csv is saved to current working directory. And we sample 4 chains, each of them will save results to profile.csv (so file is overwritten).

We should add option to save with the filename. But really CmdStan should add timestamp + id on the filename.

@bbbales2

profile_location = Path.cwd() / "profile.csv"
print(profile_location.exists(), "\n", profile_location)
bbbales2 commented 3 years ago

I think the behavior is similar to output.csv in default cmdstan. If you don't specify a filename, then things are written to output.csv.

I guess the difference then is that a profile output is not written every time (if the model has no profile statements, for instance)

Maybe the solution is to just always provide profile filenames and then check after the run if the filename have been written?

ahartikainen commented 3 years ago

Sounds like a good solution.

mitzimorris commented 3 years ago

just to let y'all know I've got a branch and got things working. need unit tests and then code review.

mitzimorris commented 3 years ago

https://github.com/stan-dev/cmdstanpy/pull/364

jmaronas commented 3 years ago

I am reopening this discussion. I have just launched a model and when several chains are run for MCMC sampling, the validate_csv_files method is raising error due to inconsistencies found in the files *-profile.csv. So if the sampling method is run with argument validate_csv = False, then this error dissapears.

I have just reinstalled cmdstanpy but still appears, more precisely in version 0.9.68

Ping @mitzimorris @ahartikainen in case this issue needs to be re-opened

mitzimorris commented 3 years ago

the fix has been merged to develop but we haven't done another release. can you use the develop branch for now?

pip install -e git+https://github.com/stan-dev/cmdstanpy@/develop
jmaronas commented 3 years ago

Sure, just wanted to let you know in case it was not checked.

Thank you