Question about preprocessing poj104 dataset into programs.pkl

c2hpxq commented 3 years ago

Hi,

Thanks for your great paper and opening source this repo. The paper is really inspiring and I've learnt a lot about how to deal with RvNN-style structure in torch from your nice implementation .

I am recently playing with the 'poj104' dataset, the one you use for code classification task. In your workflow it's directly loaded from a pickle file and each code sample is processed into an AST by parser.parse (line 20-25 in pipeline.py)

            from pycparser import c_parser
            parser = c_parser.CParser()
            source = pd.read_pickle(self.root+'programs.pkl')

            source.columns = ['id', 'code', 'label']
            source['code'] = source['code'].apply(parser.parse)

However, I find that pycparser CAN'T DIRECTLY parse the code text read from poj104. So could you please tell me how you preprocess the code text into program.pkl file? (Yes, I'm a newbie to both DL & pycparser...)

Thanks in advance!

zhifeiyv666 commented 3 years ago

I have the same question...

maximzubkov commented 3 years ago

Same question

VectorFury commented 2 years ago

As mentioned above

anushka192001 commented 1 year ago

@c2hpxq would you please share your mail id.

zhangj111 / astnn

Question about preprocessing poj104 dataset into programs.pkl #18