I was able to run your code properly on the FSC dataset and then I made my own small sample dataset to test on using the FSC147 trained weights. I mimicked your FSC147 test.json to generate the json file for my own small sample dataset, and I tried a number of ways to generate it, but it still reported frequent errors related to the json file reading. Since I can run through the FSC147 dataset, I think it should be my json file format problem, can you please provide the code to convert to train.json,val.json,test.json using COCO of FSC147 :). Below is the error report:
Traceback (most recent call last):
File "/home/chuanzhi/zyt/SAFECount-main/experiments/douzhu/../../tools/train_val.py", line 328, in
main()
File "/home/chuanzhi/zyt/SAFECount-main/experiments/douzhu/../../tools/train_val.py", line 147, in main
train_loader, val_loader, test_loader = build_dataloader(
File "/home/chuanzhi/zyt/SAFECount-main/datasets/data_builder.py", line 44, in build_dataloader
test_loader = build(cfg_dataset, dataset_type="test", distributed=distributed)
File "/home/chuanzhi/zyt/SAFECount-main/datasets/data_builder.py", line 22, in build
data_loader = build_custom_dataloader(cfg, training, distributed)
File "/home/chuanzhi/zyt/SAFECount-main/datasets/custom_dataset.py", line 46, in build_custom_dataloader
dataset = CustomDataset(
File "/home/chuanzhi/zyt/SAFECount-main/datasets/custom_dataset.py", line 100, in init
meta = json.loads(line)
File "/home/chuanzhi/anaconda3/lib/python3.9/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/home/chuanzhi/anaconda3/lib/python3.9/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 40 (char 39)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 267494) of binary: /home/chuanzhi/anaconda3/bin/python
Traceback (most recent call last):
File "/home/chuanzhi/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/chuanzhi/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/chuanzhi/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/chuanzhi/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/chuanzhi/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/chuanzhi/anaconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/home/chuanzhi/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/chuanzhi/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
../../tools/train_val.py FAILED
Failures:
Root Cause (first observed failure):
[0]:
time : 2023-08-26_18:16:18
host : chuanzhi-MS-7D18
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 267494)
error_file:
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
I was able to run your code properly on the FSC dataset and then I made my own small sample dataset to test on using the FSC147 trained weights. I mimicked your FSC147 test.json to generate the json file for my own small sample dataset, and I tried a number of ways to generate it, but it still reported frequent errors related to the json file reading. Since I can run through the FSC147 dataset, I think it should be my json file format problem, can you please provide the code to convert to train.json,val.json,test.json using COCO of FSC147 :). Below is the error report:
Traceback (most recent call last): File "/home/chuanzhi/zyt/SAFECount-main/experiments/douzhu/../../tools/train_val.py", line 328, in
main()
File "/home/chuanzhi/zyt/SAFECount-main/experiments/douzhu/../../tools/train_val.py", line 147, in main
train_loader, val_loader, test_loader = build_dataloader(
File "/home/chuanzhi/zyt/SAFECount-main/datasets/data_builder.py", line 44, in build_dataloader
test_loader = build(cfg_dataset, dataset_type="test", distributed=distributed)
File "/home/chuanzhi/zyt/SAFECount-main/datasets/data_builder.py", line 22, in build
data_loader = build_custom_dataloader(cfg, training, distributed)
File "/home/chuanzhi/zyt/SAFECount-main/datasets/custom_dataset.py", line 46, in build_custom_dataloader
dataset = CustomDataset(
File "/home/chuanzhi/zyt/SAFECount-main/datasets/custom_dataset.py", line 100, in init
meta = json.loads(line)
File "/home/chuanzhi/anaconda3/lib/python3.9/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/home/chuanzhi/anaconda3/lib/python3.9/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 40 (char 39)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 267494) of binary: /home/chuanzhi/anaconda3/bin/python
Traceback (most recent call last):
File "/home/chuanzhi/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/chuanzhi/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/chuanzhi/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/chuanzhi/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/chuanzhi/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/chuanzhi/anaconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/home/chuanzhi/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/chuanzhi/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
../../tools/train_val.py FAILED
Failures: