naturale0 / blog-comments

Comment section for https://naturale0.github.io
https://naturale0.github.io
0 stars 0 forks source link

https://naturale0.github.io/2021/01/29/setting-up-m1-mac-for-both-tensorflow-and-pytorch #2

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Setting up M1 Mac for both TensorFlow and PyTorch | De Novo

Macs with ARM64-based M1 chip, launched shortly after Apple’s initial announcement of their plan to migrate to Apple Silicon, got quite a lot of attention bo...

https://naturale0.github.io/machine%20learning/setting-up-m1-mac-for-both-tensorflow-and-pytorch

sayakpaul commented 3 years ago

Thank you for the post!

I think it should be /bin/bash Miniforge3-MacOSX-x86_64.sh -p /Users/$(whoami)/miniforge_x86_64. Notice the .sh.

naturale0 commented 3 years ago

@sayakpaul You are definitely right. I will fix it in an instant.

sahilg06 commented 3 years ago

Thanks man!! really helpful

rotemrevivo91 commented 3 years ago

Hi there, thanks a lot for the tutorial. i followed it to a success and installed torch and torchvision. I'm facing a problem that when i start the training of a network my jupiter notebook kernel just dies right away. Is that anything you may have faced before? i'd appreciate any help! thanks :)

ttopac commented 3 years ago

I am not sure if conda install -y python=3.8 in item 2 for Rosetta 2 emulation is correct as of May 2021. Running this command installs osx-arm64 version of python which ends up causing osx-64 packages giving wrong architecture error during install.

naturale0 commented 3 years ago

@tackoo The tutorial is based on conda 4.9.2 (x86). However I was able to check that the same procedure installs x86 version of python 3.8 just as intended with conda 4.10.1 (latest) as well.

conda that is called here should be x86 version of miniforge, not the ASi version installed on the first part. To ensure this, check if the content of ~/.zshrc or ~/.bash_profile file correctly calling conda from /Users/$(whoami)/miniforge3_x86_64 directory.

naturale0 commented 3 years ago

@rotemrevivo91 Unfortunately, I haven't faced any similar issue. My guess is that

  1. You might have not used Rosetta 2 and x86 version of conda when installing torch/torchvision. Either natively running terminal or using M1 conda installed on the first part will throw erroneous behavior. I think you can check it using activity monitor > architecture, but I am not sure,
  2. The neural net that you are attempting to train uses computation that utilizes some AVX2 operations which cannot be transcribed into ARM64 instruction. In this case, it is simply impossible to fix unless Apple updates Rosetta to support that operation.

You should be very careful when using conda, since in our setup, there are two different version, each called by exactly the same command: conda. Even though you installed x86 conda, you might have called M1 conda by not refreshing the $PATH environment. To do so, you need to run source ~/.bash_profile (or source ~/.zshrc if you updated your terminal to zsh) or start a new terminal session.

My (stupid but effective) recommendation for people not familiar with this is to close and re-open every terminal windows (of correct architecture) every time you install something to make sure you are using the correct one.

edit: typo

rotemrevivo91 commented 3 years ago

Hi @naturale0 thank for your answer. I was able to find a solution to my problem. It is something that is not an m1 specific problem and you tutorial is working really well! The problem occured when trying to run specific networks, for some other models no error happened and the kernal didn't die. I did the following to solve my issue:

  1. After jupyter notebook kernel just dies and didn't produce any meaningful error message i copied my code to a .py file and ran it from terminal and i got the following error: "OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized...."
  2. This lead me to the following discussion on pytorch forums: https://discuss.pytorch.org/t/kernel-dies-when-i-use-a-tensor-of-size-greater-than-127-in-a-linear-layer-on-mac-m1/112589/2 and i tried the following: conda install nomkl which is a package that toggles the use of different build variant configurations in an environment. Unfortunately this did not work.
  3. I tried re-installing conda miniforge (both versions) and all the packages - still no success
  4. Finally, i tried the added the following to the top of my jupyter notebook:
    import os
    os.environ['KMP_DUPLICATE_LIB_OK']='True'

    Which worked for my case! I hope this will help to anyone facing the same issue.

naturale0 commented 3 years ago

@rotemrevivo91 Glad you solved it. Thanks for sharing the solution!

ttopac commented 3 years ago

Thanks for your answer @naturale0. I think it depends on whether you initialize Conda after installing ASi version. If you initialize it, the terminal window you open for installing Rosetta2 version will still include ASi conda initialization parameters. And I think this causes all conda commands to still use osx-arm64 version.

Ryojikn commented 3 years ago

I'm having problems with this in the step to add to ipykernel.

Could you please give a hand?

ImportError: dlopen(/Users/ryojikn/.local/lib/python3.8/site-packages/zmq/libzmq.cpython-38-darwin.so, 10): no suitable image found. Did find: /Users/ryojikn/.local/lib/python3.8/site-packages/zmq/libzmq.cpython-38-darwin.so: mach-o, but wrong architecture /Users/ryojikn/.local/lib/python3.8/site-packages/zmq/libzmq.cpython-38-darwin.so: mach-o, but wrong architecture

naturale0 commented 3 years ago

Dear @Ryojikn,

I am not an expert neither, but it seems that the architecture of python and package does not match. If you are running python for ARM64, then all the packages should also be built for ARM64 and vice versa. Please check if the package zmq and python session that is running are based on the same architecture.

nimbid commented 3 years ago

I only needed Pytorch so I have followed step 1 under 'Install PyTorch for x86_64 (Rosetta 2)'. However, when I try to run 'conda' zsh is not recognizing the command. What's the right way to add the path to zsh so that it gets detected?

Ideally, I would like to run the x86 miniforge using a descriptive alias like 'condaintel'. How can I do this for zsh?

I do not need ternsor flow right now so I have only done:

  1. Creatted a Rosetta terminal.
  2. Installed miniforge for x86 by running '/bin/bash Miniforge3-MacOSX-x86_64.sh -p /Users/$(whoami)/miniforge_x86_64'.

That is it.

nimbid commented 3 years ago

As a follow up to my comment, I also want to add that I have python 3.9.5 and homebrew 3.2.1 installed on my macbook. Will that affect things?