tech-srl / code2seq

Code for the model presented in the paper: "code2seq: Generating Sequences from Structured Representations of Code"
http://code2seq.org
MIT License
555 stars 164 forks source link

I can not preprocess Python dataset #106

Closed Avv22 closed 2 years ago

Avv22 commented 2 years ago

Hello,

I downloaded the preprocessed 150k Python dataset and run extractor.sh:

#!/usr/bin/env bash
Python150k=$(pwd)/Python150k/
DATA_DIR=$(pwd)/data/
SEED=239
python extract.py  \
    --data_dir=$Python150k \
    --output_dir=$DATA_DIR \
    --seed=$SEED

I got memory error after 20 minutes and computer stopped. I have 16 GB RAM and Windows 10. Can you please help me with this issue?