tonymet / gcloud-lite

gcloud-lite is a distribution of the google cloud platform (GCP) CLI that strips unnecessary dependencies to reduce the size by > 75%, with significant cost & time savings.
GNU General Public License v3.0
39 stars 0 forks source link

question on gsutil from lite package #2

Closed greenozon closed 3 months ago

greenozon commented 5 months ago

While using gsutil I've noticed this warning, any clues how to fix it? using the archive + unpack option (not docker file)

WARNING: gsutil rsync uses hashes when modification time is not available at
both the source and destination. Your crcmod installation isn't using the
module's C extension, so checksumming will run very slowly. If this is your
first rsync since updating gsutil, this rsync can take significantly longer than
usual. For help installing the extension, please see "gsutil help crcmod".
tonymet commented 5 months ago

thanks for testing! can you share the full gcloud/gsutil commands you are using that trigger this ? i'll try to repro and see if there is config we can add during the build process.

Also please share your gcloud config (clearing any credentials.). Here is mine ...

 cat ~/.config/gcloud/configurations/config_default
[core]
account =xxxxx@gmail.com
project = xxxx-2024

[auth]

[artifacts]
repository = us-west1-docker.pkg.dev/xxxx/xxxx

[run]
region = us-west1
greenozon commented 5 months ago

Here is my script I"m using (well, not full something was hidden for known reasons :)

#!/bin/bash

if ! [ -x "$(command -v gsutil)" ]; then
  echo 'Error: gsutil not found, installing...' >&2

#   orig official way - long and heavy in footprint
#  curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
#  echo "deb https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
#  apt-get update && apt-get install -y google-cloud-cli

  # new lightway option, POC for some time...
  CLOUD_SDK_VER=475.0.0
  curl -LO https://github.com/tonymet/gcloud-lite/releases/download/${CLOUD_SDK_VER}/google-cloud-cli-${CLOUD_SDK_VER}-linux-x86_64-lite.t
ar.gz
  tar -zxf *.gz && rm *.gz
  export PATH=./google-cloud-sdk/bin:$PATH
fi

XXXYYY_BUCKET="gs://xyxyxyxyxyxyxy/assets"
gsutil -m rm -r $XXXYYY_BUCKET
gsutil -m rsync -r collected_static $XXXYYY_BUCKET

the warning message comes out from the rsync (2nd gsutil) command....

eg:

WARNING: gsutil rsync uses hashes when modification time is not available at both the source and destination. Your crcmod installation isn't using the module's C extension, so checksumming will run very slowly. If this is your first rsync since updating gsutil, this rsync can take significantly longer than usual. For help installing the extension, please see "gsutil help crcmod".

Building synchronization state... Starting synchronization... Copying file://xxxyyy.jpg [Content-Type=image/jpeg]... ....

Now, the env I"m using it in: its a k8s pod, not a host shell/etc

everything is done inside fresh pod in GKE k8s cluster, so that the auth info/etc is read by google libs using google metadata service

the file you asked about is just..... empty!

ls -al  ~/.config/gcloud/configurations/config_default
-rw-r--r-- 1 root root 0 May 22 18:00 /root/.config/gcloud/configurations/config_default

The ultimate goal is to figure out how to use gsutil rsync without that nasty warning...

tonymet commented 5 months ago

Here's how to fix

sudo apt install -y python3-pip python3-crcmod
# check that it works
 gsutil version -l |grep crcmod
compiled crcmod: True

It seems that since we are stripping the python3 installation, gcloud needs crcmod installed via apt. The above commands should address that.

tonymet commented 5 months ago

here's a test on a fresh VM showing that the above works with no warning

BEFORE

gsutil rsync test gs://tonym-us/test/

WARNING: gsutil rsync uses hashes when modification time is not available at
both the source and destination. Your crcmod installation isn't using the
module's C extension, so checksumming will run very slowly. If this is your
first rsync since updating gsutil, this rsync can take significantly longer than
usual. For help installing the extension, please see "gsutil help crcmod".

Building synchronization state...

AFTER

 gsutil rsync test gs://tonym.us/test/
Building synchronization state...
Starting synchronization...
Copying file://test/kaka4 [Content-Type=application/octet-stream]...
AccessDeniedException: 403 Access denied.
greenozon commented 5 months ago

pardon for delay, busy week... :) well, I can't get the same bright result as in your case...

from withing my k8s pod:

# apt install -y python3-pip python3-crcmod
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
python3-pip is already the newest version (23.0.1+dfsg-1).
python3-crcmod is already the newest version (1.7+dfsg-3+b3).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root@django:/portal# 
root@django:/portal# gsutil version -l
gsutil version: 5.27
checksum: 5cf9fcad0f47bc86542d009bbe69f297 (OK)
boto version: 2.49.0
python version: 3.8.19 (default, May 14 2024, 09:10:20) [GCC 12.2.0]
OS: Linux 5.15.146+
multiprocessing available: True
using cloud sdk: True
pass cloud sdk credentials to gsutil: True
config path(s): No config found
gsutil path: /portal/google-cloud-sdk/bin/gsutil
compiled crcmod: False
installed via package manager: False
editable install: False
shim enabled: False
root@django:/portal# cat /etc/issue
Debian GNU/Linux 12 \n \l

root@django:/portal# gsutil version -l | grep crc
compiled crcmod: False
root@django:/portal#
greenozon commented 5 months ago

the help from gsutil says an insane steps!!! -

gsutil help crcmod

...............

 Debian and Ubuntu
  -----------------

  To compile and install crcmod:

    sudo apt-get install gcc python3-dev python3-setuptools
    sudo pip3 uninstall crcmod
    sudo pip3 install --no-cache-dir -U crcmod

installing full gcc toolchain as full python is a killer - will dump hundres of MBs ... I definitely dont like this way to follow... any other ideas?

tonymet commented 5 months ago

installing full gcc toolchain as full python is a killer - will dump hundres of MBs ... I definitely dont like this way to follow... any other ideas?

check $CLOUDSDK_PYTHON and the ghutil version -l python version settings to make sure you are using the debian python and not another one (e..g the internal gcloud one may have been auto-installed by the cli)

greenozon commented 5 months ago

the mystery still going on...

root@django:/portal# gsutil version -l
gsutil version: 5.27
checksum: 5cf9fcad0f47bc86542d009bbe69f297 (OK)
boto version: 2.49.0
python version: 3.8.19 (default, May 14 2024, 09:10:20) [GCC 12.2.0]
OS: Linux 5.15.146+
multiprocessing available: True
using cloud sdk: True
pass cloud sdk credentials to gsutil: True
config path(s): No config found
gsutil path: /portal/google-cloud-sdk/bin/gsutil
compiled crcmod: False
installed via package manager: False
editable install: False
shim enabled: False
root@django:/portal# echo $CLOUDSDK_PYTHON

root@django:/portal# which python
/usr/local/bin/python
root@django:/portal# python -V
Python 3.8.19

root@django:/portal# apt install python3-pip python3-crcmod
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
python3-pip is already the newest version (23.0.1+dfsg-1).
python3-crcmod is already the newest version (1.7+dfsg-3+b3).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root@django:/portal# pip freeze | grep crc
google-crc32c==1.5.0

even after setting export CLOUDSDK_PYTHON=python

the gsutil version -l shows the same output

tonymet commented 5 months ago

I would try using absolute path to python in case the PATH var varies in gcloud. It is a bizarre issue but I thank you for your help in continuing to test. i'm sure others may run into it .

greenozon commented 5 months ago

Thanks a lot for keeping up the fire up & running :) it really makes sense in the crazy XXII century!

so, with your kind push I tried the absolute path using pwd: construct - it did not help, then ..... I went to gcloud sources and start reading it... smth bites me and I tried this way and ... it turned to be successful one, here is the walkthrough:

could you try it on our end as well?


root@django:/portal# export PATH=`pwd`/google-cloud-sdk/bin:$PATH
root@django:/portal# echo $PATH
/portal/google-cloud-sdk/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
root@django:/portal# gsutil version -l

Updates are available for some Google Cloud CLI components.  To install them,
please run:
  $ gcloud components update

gsutil version: 5.27
checksum: 5cf9fcad0f47bc86542d009bbe69f297 (OK)
boto version: 2.49.0
python version: 3.8.19 (default, May 14 2024, 09:10:20) [GCC 12.2.0]
OS: Linux 5.15.146+
multiprocessing available: True
using cloud sdk: True
pass cloud sdk credentials to gsutil: True
config path(s): No config found
gsutil path: /portal/google-cloud-sdk/bin/gsutil
compiled crcmod: False
installed via package manager: False
editable install: False
shim enabled: False
root@django:/portal# python -V
Python 3.8.19
root@django:/portal# pip install crcmod
Collecting crcmod
  Downloading crcmod-1.7.tar.gz (89 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.7/89.7 kB 1.7 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: crcmod
  Building wheel for crcmod (setup.py) ... done
  Created wheel for crcmod: filename=crcmod-1.7-cp38-cp38-linux_x86_64.whl size=30366 sha256=07529a8ca51bf1e1a18116c077e0482edf67c86434b0040951c8b3131e2e8f44
  Stored in directory: /root/.cache/pip/wheels/ca/5a/02/f3acf982a026f3319fb3e798a8dca2d48fafee7761788562e9
Successfully built crcmod
Installing collected packages: crcmod
Successfully installed crcmod-1.7
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: [https://pip.pypa.io/warnings/venv](https://www.google.com/url?q=https://pip.pypa.io/warnings/venv&sa=D&source=calendar&usd=2&usg=AOvVaw0O6MatQJ9YsSsspW4mOIFS)

[notice] A new release of pip is available: 23.0.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
root@django:/portal# gsutil version -l
gsutil version: 5.27
checksum: 5cf9fcad0f47bc86542d009bbe69f297 (OK)
boto version: 2.49.0
python version: 3.8.19 (default, May 14 2024, 09:10:20) [GCC 12.2.0]
OS: Linux 5.15.146+
multiprocessing available: True
using cloud sdk: True
pass cloud sdk credentials to gsutil: True
config path(s): No config found
gsutil path: /portal/google-cloud-sdk/bin/gsutil
compiled crcmod: True
installed via package manager: False
editable install: False
shim enabled: False
tonymet commented 5 months ago

i'll test that out . one thing to watch out for when running gcloud components update is that it downloads and installs the "bloat" including the embedded python. This is probably fine on dev machines but when you move to prod it will often hang your VM.

I'll see if i can repro and see if there are other "lite" configs we can find that will help avoid the hit to vcpu + IOPS

greenozon commented 5 months ago

by above post I just wanted to highlight that for some reason gsutil wants crcmod python package and not the python3-crcmod which turned out to be python package named google-crc32c==1.5.0, right?

a bit confusing... stuff...

and yeah - I did not exec any components update, this is by no means my intention all I need - to get up & fully running gstuil as fast as possible using host Python distro

greenozon commented 5 months ago

also, to be honest the log message

Successfully built crcmod

is a bit confusing... I presume that the python package has already got pre-built code so it just successfully reused it?