Closed GigSam closed 1 year ago
Hi, thank you for your post!
If you open a tensorboard you can see the progress of those metrics (p_opt, p_exp, and h_mean) as shown here: https://github.com/omron-sinicx/neural-astar/issues/4#issuecomment-1356944700 is this what you are looking for?
Hi 😀 I am facing the same issue, I am not able to see the training progress (loss and metrics) because no log files are generated. Is this normal?
Thank you!
Thank you! At least when working on https://github.com/omron-sinicx/neural-astar/pull/9, all the metrics were logged as intended. Will look into it.
Hi! I've been investigating this issue but am having difficulty reproducing it. If I clone the repository, create venv, and run train.py
, the metrics were logged on tb as follows.
My environment is with:
python==3.8
tensorboard==2.11.0
pytorch-lightning==1.8.5.post0
I will try other envs and module versions, but would it be possible to share your environment and versions of related modules (maybe tb and ptl versions may affect?) that cause this logging issue? or did you get any warning messages for logging failures? @GigSam @luigidamico100
Thank you!
@yonetaniryo my environment is:
The problem is that by cloning the repo, creating and activating venv and running train.py i don't see any "metrics" folder nor any log produced by the training file, even if the algorithm works fine and no warning for logging is produced. I really don't know what's causing this issue.
Thank you for sharing your environment. Just wanted to make sure that the logs are stored in model/mazes_032_moore_c8/lightning_logs/version_*
for mazes_032_moore_c8
, not in metrics
. Also we have the only checkpoint in model/mazes_032_moore_c8/lightning_logs/version_0
on github to reduce the repository size. When you clone the repo and start the training, the following dir and files should appear:
model/mazes_032_moore_c8/lightning_logs/version_1:
checkpoints events.out.tfevents.... hparams.yaml
I have checked the logging in the environment as close as that of @GigSam with python3.10.9
and tensorboard==2.10.1
used. However I'm not yet able to reproduce the issue. Can you double check if the logs are stored in model
dir? Or you may try using our Dockerfile that will give us exactly the same environment. Thank you!
Sorry but I’m going to close this issue because I cannot reproduce the logging problem. If someone encounters the same problem please check if the metrics data are stored in model
directory. And please don’t hesitate to re-open the issue if you can reproduce the problem. Thank you for the report!
On the "minimal" branch, differently from what was done in the example.ipynb file of the previous version of the repo (the one without pytorch lightning, similar to the branch "3-improve-repository-organization"), it seems that you don't use the logs of the Opt, Exp and Hmean metrics when the training is performed. I would like to visualize those metrics, but the "metrics" folder isn't created by running the train.py script. Thank you for your support.