stan-dev / cmdstan

CmdStan, the command line interface to Stan
https://mc-stan.org/users/interfaces/cmdstan
BSD 3-Clause "New" or "Revised" License
210 stars 93 forks source link

Write all chains to one file #1254

Closed SteveBronder closed 5 months ago

SteveBronder commented 5 months ago

Summary:

Right now we write each chain to a separate file. Should we just write all chains to one file and have a chain column to say what chain the sample came from? It would add a new column to parse which could hurt backwards compatability for our output. But I think it would end up being making reading and writing a little simpler

Additional Information:

Provide any additional information here.

Current Version:

v2.34.1

mitzimorris commented 5 months ago

managing the files is a PITA, but:

that said, if we move to Apache Arrow / more granular outputs, it would make sense because otherwise, gazillions of files.

SteveBronder commented 5 months ago

As long as we had a column for chain IDs I'd the file we could still compute rhat.

Did we intend to have an option that if a chain is slow we kill it? Idk how this would stop that since we'd just stop writing that chain to the file

mitzimorris commented 5 months ago

is the problem here managing the output files in CmdStanR?

SteveBronder commented 5 months ago

No it's just in general this feels like a nice format for reading upstream in general. Instead of having to read multiple files and combine them upstream things would just need to read one file.

WardBrian commented 5 months ago

As the number of chains grows I could see you having issues with contending for the lock on the file handle

SteveBronder commented 5 months ago

Yeah I agree with the contention issue. We would probably have to do something kind of silly with mmap and an offset to get reasonable speeds. We'd also have to change how we write out the mass matrix since we would have many for one file.

I'm going to close this as "probably not worth it"