Open caicj15 opened 11 months ago
pip install xformers==0.0.13
is okay. However, there are other problems.
pip install xformers==0.0.13
is okay. However, there are other problems.
Yes, there is other problem and it still fails.
I can't install nvidia-apex using the following command:
git clone https://github.com/NVIDIA/apex
cd apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key...
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
Hi @caicj15 @fikry102 , can you try this xichenpan/kosmosg:v1, it is committed through our research environment, it can work well in Microsoft local nodes and clusters.
Hi @caicj15 @fikry102 , can you try this xichenpan/kosmosg:v1, it is committed through our research environment, it can work well in Microsoft local nodes and clusters.
Hi @apolinario, can you also try this image :D
Hi @caicj15 @fikry102 , can you try this xichenpan/kosmosg:v1, it is committed through our research environment, it can work well in Microsoft local nodes and clusters.
I am herbert. I use your image, and "from torchscale.architecture.config import EncoderDecoderConfig" fails.
@caicj15 Would you mind try running this again
pip install torchscale/
pip install open_clip/
pip install fairseq/
pip install infinibatch/
I can successfully run this repo by using pytorch==1.13.1 and xformers==0.0.16
@caicj15 Would you mind try running this again
pip install torchscale/ pip install open_clip/ pip install fairseq/ pip install infinibatch/
However, how can I debug the app.py with VScode? (or with PyCharm) It is easy when we use "python train.py xxxx": just add xxxx into "args" in launch.json. But for "python -m yyyy app.py xxxx", how can I debug the app.py?
python -m torch.distributed.launch --nproc_per_node=1 --nnodes=1 \
app.py None \
--task kosmosg \
--criterion kosmosg \
--arch kosmosg_xl \
--required-batch-size-multiple 1 \
--dict-path data/dict.txt \
--spm-model data/sentencepiece.bpe.model \
--memory-efficient-fp16 \
--ddp-backend=no_c10d \
--distributed-no-spawn \
--subln \
--sope-rel-pos \
--checkpoint-activations \
--flash-attention \
--pretrained-ckpt-path ./kosmosg_checkpoints/checkpoint_final.pt
Hi @fikry102 ,good to know! for Pycharm debug you can refer to https://intellij-support.jetbrains.com/hc/en-us/community/posts/360003879119-how-to-run-python-m-command-in-pycharm-
I can successfully run this repo by using pytorch==1.13.1 and xformers==0.0.16
can you give me the command to install pytorch and xformers right version, i try install with pip but it still failed, tks @fikry102
I can successfully run this repo by using pytorch==1.13.1 and xformers==0.0.16
@fikry102, I tried with pytorch==1.13.1 and xformers==0.0.16 (beginning with the author's docker) but still get many errors. I would be very grateful if you could provide the command you used to install the correct packages, including any other changes you had to make to the provided setup script.
Did anyone successfully replicate the results??. Would love to know the environment used?
Did anyone successfully replicate the results??. Would love to know the environment used?
Hi @Namangarg110 , could you please try our docker, people said they success using following script:
docker run --runtime=nvidia --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --name kosmosg --privileged=true -it -v /mnt:/mnt/ xichenpan/kosmosg:v1 /bin/bash
git clone https://github.com/microsoft/unilm.git
cd unilm/kosmos-g
pip install torchscale/
pip install open_clip/
pip install fairseq/
pip install infinibatch/
Hi @fikry102, Thank you for suggesting above fixes. However, when I pip install those two packages in the given docker image, I still get the following error. Could you suggest any solution?
ImportError: /opt/conda/lib/python3.8/site-packages/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops19empty_memory_format4callEN3c108ArrayRefIlEENS2_8optionalINS2_10ScalarTypeEEENS5_INS2_6LayoutEEENS5_INS2_6DeviceEEENS5_IbEENS5_INS2_12MemoryFormatEEE
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 6038) of binary: /opt/conda/bin/python
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py", line 195, in <module>
main()
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py", line 191, in main
launch(args)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py", line 176, in launch
run(args)
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
demo/gradio_app.py FAILED
I can successfully run this repo by using pytorch==1.13.1 and xformers==0.0.16
When installing xformers according to official instruction, it fails. Low version of torch + high version of xformers is difficult to install. Can anyone offer a docker image?