Closed alexander-turner closed 7 years ago
WinError means you are on Windows? Windows locks files when they are written to, and in some other cases.
With any kind of multiprocessing, a good idea is to write to different files. It's second parameter in Monitor constructor for this purpose.
Which parameter would that be - uid? These are the ones I see. I currently have force
disabled and resume
enabled.
directory (str): A per-training run directory where to record stats.
video_callable (Optional[function, False]): function that takes in the index of the episode and
outputs a boolean, indicating whether we should record a video on this episode.
The default (for video_callable is None) is to take perfect cubes, capped at 1000.
False disables video recording.
force (bool): Clear out existing training data from this directory
(by deleting every file prefixed with "openaigym.").
resume (bool): Retain the training data already in this directory, which will be merged with our new data
write_upon_reset (bool): Write the manifest file on each reset. (This is currently a JSON file,
so writing it is somewhat expensive.)
uid (Optional[str]): A unique id used as part of the suffix for the file. By default, uses os.getpid().
mode (['evaluation', 'training']): Whether this is an evaluation or training episode.
Oops I was looking on different Monitor, sorry. So as a random advice, try to disable writing manifest for every process except one. If there's a problem others are likely to hit, let's fix it for everyone.
I'm not quite sure how to disable writing for all but one, but I did enable write_upon_reset
and that fixed the access error. However, the episode_batch
file still leaves much to be desired:
{"initial_reset_timestamp": 1498670304.8532367, "timestamps": [], "episode_lengths": [], "episode_rewards": [], "episode_types": ["t"]}
Accordingly, this error appears:
gym.error.InvalidRequestError: Request req_JN9PvTSQ3GqpxZ41PHm5A: Must provide a training episode batch.
Edit: For unrelated reasons, I updated the interface so it uses multiprocessing.Pool
; the issue remains, however.
Fixed issue - problem was with how I was closing the environments and keeping track of the monitors.
I solved another issue, but this underlying issue is still there, unfortunately. It will now run and save the data correctly, but it still won't write the correct information to the episode_batch
file, so I can't upload anything (even though all the videos and other metadata are present).
Figured this out more quickly than I thought I would. Turns out that using multiprocessing
like this causes stats_recorder
to flush its output way too early (when it doesn't have anything, generally). I fixed this by having each process in the pool pass back its local episode information, manually creating the requisite lists for the batch file in the env.stats_recorder
object, flushing, and then uploading.
I recently implemented
multiprocessing
for running episodes (since I'm testing with non-learning bandits). It works fine when no monitors are initialized, but I get an error when they are. The episode video files are written fine, but the manifest isn't:The code is here.