wasiahmad / PLBART

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].
https://arxiv.org/abs/2103.06333
MIT License
186 stars 35 forks source link

unzip unsuccessful when running download.sh #43

Closed chungen04 closed 2 years ago

chungen04 commented 2 years ago

I tried to fine tune the code refinement task on the PLBART paper, I set up the conda environment by bash install_env.sh, then download the checkpoints. However, when I run bash download.sh under data/codeXglue, I got this which seems that either unzip or the download was unccessful.

image

Am I missing some steps in the setup?

wasiahmad commented 2 years ago

Hi, this is perhaps because downloading large files from GDrive is facing issues (we also observed the problem). Please, try the following function (which worked for us) to download files from google drive.

function wget_gdrive() {
    GDRIVE_FILE_ID=$1
    DEST_PATH=$2
    if [[ ! -f "$DEST_PATH" ]]; then
        echo "Downloading file from https://drive.google.com/file/d/${GDRIVE_FILE_ID}"
        wget --save-cookies cookies.txt 'https://docs.google.com/uc?export=download&id='$GDRIVE_FILE_ID -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1/p' >confirm.txt
        wget --load-cookies cookies.txt -O $DEST_PATH 'https://docs.google.com/uc?export=download&id='$GDRIVE_FILE_ID'&confirm='$(<confirm.txt)
        rm cookies.txt confirm.txt
    fi
}
chungen04 commented 2 years ago

Hi, this is perhaps because downloading large files from GDrive is facing issues (we also observed the problem). Please, try the following function (which worked for us) to download files from google drive.

function wget_gdrive() {
    GDRIVE_FILE_ID=$1
    DEST_PATH=$2
    if [[ ! -f "$DEST_PATH" ]]; then
        echo "Downloading file from https://drive.google.com/file/d/${GDRIVE_FILE_ID}"
        wget --save-cookies cookies.txt 'https://docs.google.com/uc?export=download&id='$GDRIVE_FILE_ID -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1/p' >confirm.txt
        wget --load-cookies cookies.txt -O $DEST_PATH 'https://docs.google.com/uc?export=download&id='$GDRIVE_FILE_ID'&confirm='$(<confirm.txt)
        rm cookies.txt confirm.txt
    fi
}

Now I finished the conda environment setup and dataset download, and the prepare seems to be fine. However when I run bash run.sh 0 it shows that some files are missing. Did I potentially miss some steps? (I didn't trace the missing file names in the repo)

image

wasiahmad commented 2 years ago

This type errors could be easily solved. (https://github.com/wasiahmad/PLBART/blob/main/install_env.sh#L33)