Closed hollow-503 closed 1 year ago
Probably you can start with evaluating the pretrained models. If you cannot run multi-gpu inference it would be hard for you to run multi-gpu training. By the way, if your custom setup does not work, we recommend you to try out the docker setup first.
I could only run with:
torchpack dist-run -np 1 python tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml pretrained/bevfusion-det.pth --eval bbox
and if i set -np 1
, i will get the result
if i set -np 2
, it will remind me that:
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
/opt/conda/envs/xxx/bin/python
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
is that means i should evaluate or train with more powerful cpu and gpu ?
Probably you can start with evaluating the pretrained models. If you cannot run multi-gpu inference it would be hard for you to run multi-gpu training. By the way, if your custom setup does not work, we recommend you to try out the docker setup first.
i successfully run with
torchpack dist-run -np 1 python tools/train.py configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/default.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
but only single gpu is working.If i run with -np 2
,it remind me that there are no enough slots
Just to confirm, have you tried out our docker setup?
Just to confirm, have you tried out our docker setup?
no,i just tried the custom setup
My suggestion would be to try out the docker setup first. If the docker setup can work, I would suggest you to compare your system setup against our Dockerfile
and fix the difference.
My suggestion would be to try out the docker setup first. If the docker setup can work, I would suggest you to compare your system setup against our
Dockerfile
and fix the difference.
Thanks for your reply, i train it successfully. However, when i run the visualization command:
torchpack dist-run -np 1 python tools/visualize.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --checkpoint pretrained/bevfusion-det.pth --out-dir viz/fusion-det-pred --mode pred --box-score 0.99
. The result is:
The result seems to be error, although i try tuning the box-score
from 0.1 to 0.99. I also try tuning the nms-threshold
but in the configs/nuscenes/det/transfusion/defalut.yaml
, the nms_type
is null and i cannot tune nms-threshold
.
My evaluation result with pretrained model is:
@hollow-503,
Would you mind also trying out visualizing the predictions on the camera images? I saw duplicate bounding boxes before, but it seems that your visualizations are wildly off.
Best, Haotian
@hollow-503,
Would you mind also trying out visualizing the predictions on the camera images? I saw duplicate bounding boxes before, but it seems that your visualizations are wildly off.
Best, Haotian
thanks to your quick reply, the predictions on the camera images is:
with command:
torchpack dist-run -np 1 python tools/visualize.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --checkpoint pretrained/bevfusion-det.pth --out-dir viz/fusion-det-pred --mode pred --box-score 0.99
@hollow-503,
Would you mind also trying out visualizing the predictions on the camera images? I saw duplicate bounding boxes before, but it seems that your visualizations are wildly off.
Best, Haotian
I solve it by straightly setting bbox-score
in visualize.py
:
parser.add_argument("--bbox-score", type=float, default=0.04)
and it works now.
However, i am curious about that in the camera+lidar configuration, it seems you set the bbox-score=0
:
but the model still get wonderful result. Is there something that i ignore?
I think for our demos we did not use bbox_score=0
. If we do camera only we probably did NMS (because it is simpler, you don't have to tune the parameters).
Thanks to your reply! Actually when i reproduce the train fusion model with command:
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --load-from pretrained/lidar-only-det.pth
following the #189
However, error occurs:
Traceback (most recent call last):
File "/home/xxx/bevfusion/lib/python3.8/site-packages/yapf/yapflib/pytree_utils.py", line 119, in ParseCodeToTree
tree = parser_driver.parse_string(code, debug=False)
File "/home/xxx/bevfusion/lib/python3.8/lib2to3/pgen2/driver.py", line 103, in parse_string
return self.parse_tokens(tokens, debug)
File "/home/xxx/bevfusion/lib/python3.8/lib2to3/pgen2/driver.py", line 71, in parse_tokens
if p.addtoken(type, value, (prefix, start)):
File "/home/xxx/bevfusion/lib/python3.8/lib2to3/pgen2/parse.py", line 162, in addtoken
raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=3, value="'deterministic'", context=('\n', (2, 0))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/xxx/bevfusion/lib/python3.8/site-packages/yapf/yapflib/pytree_utils.py", line 125, in ParseCodeToTree
tree = parser_driver.parse_string(code, debug=False)
File "/home/xxx/bevfusion/lib/python3.8/lib2to3/pgen2/driver.py", line 103, in parse_string
return self.parse_tokens(tokens, debug)
File "/home/xxx/bevfusion/lib/python3.8/lib2to3/pgen2/driver.py", line 71, in parse_tokens
if p.addtoken(type, value, (prefix, start)):
File "/home/xxx/bevfusion/lib/python3.8/lib2to3/pgen2/parse.py", line 162, in addtoken
raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=3, value="'deterministic'", context=('\n', (2, 0))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/xxx/bevfusion/lib/python3.8/site-packages/yapf/yapflib/yapf_api.py", line 183, in FormatCode
tree = pytree_utils.ParseCodeToTree(unformatted_source)
File "/home/xxx/bevfusion/lib/python3.8/site-packages/yapf/yapflib/pytree_utils.py", line 131, in ParseCodeToTree
raise e
File "/home/xxx/bevfusion/lib/python3.8/site-packages/yapf/yapflib/pytree_utils.py", line 129, in ParseCodeToTree
ast.parse(code)
File "/home/xxx/bevfusion/lib/python3.8/ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 2
'deterministic': False
^
SyntaxError: invalid syntax
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tools/train.py", line 87, in <module>
main()
File "tools/train.py", line 51, in main
logger.info(f"Config:\n{cfg.pretty_text}")
File "/home/xxx/bevfusion/lib/python3.8/site-packages/mmcv/utils/config.py", line 496, in pretty_text
text, _ = FormatCode(text, style_config=yapf_style, verify=True)
File "/home/xxx/bevfusion/lib/python3.8/site-packages/yapf/yapflib/yapf_api.py", line 186, in FormatCode
raise errors.YapfError(errors.FormatErrorMsg(e))
yapf.yapflib.errors.YapfError: <unknown>:2:1: invalid syntax
And in the funtion init_weights
in bevfusion.py
, it seems that you do not load any lidar pretrained weights in the training initialization:
def init_weights(self) -> None:
if "camera" in self.encoders:
self.encoders["camera"]["backbone"].init_weights()
Could you tell me the command to train the fusion model?
You may try "load_from" @hollow-503. Detailed experiment configurations will be provided in the future. Please stay tuned.
Closed due to inactivity.
Probably you can start with evaluating the pretrained models. If you cannot run multi-gpu inference it would be hard for you to run multi-gpu training. By the way, if your custom setup does not work, we recommend you to try out the docker setup first.
i successfully run with
torchpack dist-run -np 1 python tools/train.py configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/default.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
but only single gpu is working.If i run with
-np 2
,it remind me that there are no enough slots
@hollow-503, Could you please tell me how you solved this problem: There are not enough slots available in the system to satisfy the 4 slots
?
Thanks
Probably you can start with evaluating the pretrained models. If you cannot run multi-gpu inference it would be hard for you to run multi-gpu training. By the way, if your custom setup does not work, we recommend you to try out the docker setup first.
i successfully run with
torchpack dist-run -np 1 python tools/train.py configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/default.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
but only single gpu is working.If i run with-np 2
,it remind me that there are no enough slots@hollow-503, Could you please tell me how you solved this problem:
There are not enough slots available in the system to satisfy the 4 slots
? Thanks
Actually, i used another server with 8 gpu and it worked. I think it has something to do with your hardware.
Hello, when I run:
torchpack dist-run -np 4 python tools/train.py configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/default.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
I get an error:i have 2 Gold 6148 and 4 V-100 in single machine, but it still said the slots are not enough. how should i run the train or test codes on multi-gpu in single machine?