rbalestr-lab / stable-SSL

https://rbalestr-lab.github.io/stable-SSL.github.io/dev/
MIT License
4 stars 2 forks source link

[Bug Report] JSONL reader does not properly parse log_rank_<num>.jsonl files #80

Closed plumol closed 1 week ago

plumol commented 3 weeks ago

logrank.jsonl files are being split only by '_', which results in fragment of '.jsonl' left as the last split. This is incompatible with int().

The current code is also backwards-incompatible with csv_logs.jsonl from previous versions of stable-ssl.

https://github.com/rbalestr-lab/stable-SSL/blob/29208ef6b4a3e959ec4b4bcbbe012dcb601f89ec/stable_ssl/reader.py#L66C1-L67C1

File [/oscar/home/klam20/stable-SSL/stable_ssl/reader.py:66](https://vscode-remote+ood-002eccv-002ebrown-002eedu.vscode-resource.vscode-cdn.net/oscar/home/klam20/stable-SSL/stable_ssl/reader.py:66), in jsonl_run(path)
     [63](https://vscode-remote+ood-002eccv-002ebrown-002eedu.vscode-resource.vscode-cdn.net/oscar/home/klam20/stable-SSL/stable_ssl/reader.py:63) logs_files = list(_path.glob("logs_rank_*"))
     [64](https://vscode-remote+ood-002eccv-002ebrown-002eedu.vscode-resource.vscode-cdn.net/oscar/home/klam20/stable-SSL/stable_ssl/reader.py:64) for log_file in logs_files:
     [65](https://vscode-remote+ood-002eccv-002ebrown-002eedu.vscode-resource.vscode-cdn.net/oscar/home/klam20/stable-SSL/stable_ssl/reader.py:65)     # Extract rank from the filename.
---> [66](https://vscode-remote+ood-002eccv-002ebrown-002eedu.vscode-resource.vscode-cdn.net/oscar/home/klam20/stable-SSL/stable_ssl/reader.py:66)     rank = int(log_file.name.split("_")[-1])
     [67](https://vscode-remote+ood-002eccv-002ebrown-002eedu.vscode-resource.vscode-cdn.net/oscar/home/klam20/stable-SSL/stable_ssl/reader.py:67)     for obj in jsonlines.open(log_file).iter(type=dict, skip_invalid=True):
     [68](https://vscode-remote+ood-002eccv-002ebrown-002eedu.vscode-resource.vscode-cdn.net/oscar/home/klam20/stable-SSL/stable_ssl/reader.py:68)         obj["rank"] = rank  # Add rank field to each dict.

ValueError: invalid literal for int() with base 10: '0.jsonl'
RandallBalestriero commented 1 week ago

Solved in https://github.com/rbalestr-lab/stable-SSL/pull/106. Using Trainer.get_logs correctly parse .jsonl logs.