Open breandan opened 1 year ago
Hi George, just a quick update in case you were working on the anonymized dataset. I was able to partially reproduce the seq2parse results on an alternate dataset from Wong et al. (2019), however the source code predictions are a little tricky to compare due to the aforementioned issue with mapping abstract sequences back to character sequences. Although I wasn't sure sure how to obtain the Precision@{10,20,50} over concrete source code, I was able to run the seq2parse.py
script and based on a Top-1 analysis of ~400 broken/fixed pairs from the StackOverflow dataset containing <3 abstract token edits, roughly ~86% of the Seq2Parse repairs were syntactically valid, ~35% matched the abstract tokens from the human fixes, and ~0.5% matched the human fixes on a character level. Are those numbers drastically out of line with what we should expect? Also FYI, the web demo now seems to be unavailable. Thank you again.
Hi @breandan,
Sorry for the late reply, it's being quite busy. Of course I remember our talk back in December and nice to hear from you again!
create_ecpp_dataset_full.py
for how the training and test sets were created, using ecpp_individual_grammar.py
(i.e. the Earley Parser) to abstract the programs. _ENDMARKER_
just signifies the end of a program in our grammar python-grammar.txt
. There is no repairing happening here, just predicting the error rules.actual_tokens
reversed as a stack in order to fill in the information that haven't changed. The fix_seq_operations
tell us if we have a token insertion, deletion or replacement and we use the stack accordingly to avoid using the wrong tokens. For example, you will see in the first 2 cases for '<<+'
and '<<$'
that we are inserting or replacing a token, we use new dummy ones such as "simple_name". Unfortunately, they way we developed the error-correcting parser back then didn't allow us to avoid some cosmetic changes and preserve formatting. seq2parse.py
kinda tries to that with the diff, but I think it might not be too accurate.Let me know if you have any more questions.
Hi @gsakkas, I hope you are doing well. I am not sure if you recall, but we met briefly after your talk in New Zealand last December. I am working on reproducing the results on the 15k ERule and HumanEval dataset and had a few questions about the abstract sequences used in section 7.1-7.4 of the paper. Any suggestions or advice you could provide would be greatly appreciated.
src/human_study
, or is there another test set of source code snippets?tokns
,tok_chgs
,dur
,popular
, andpredict_eccp_classifier_partials.py
compares the classifier predictiony_pred
with thetok_chgs
using the labels fileerule_labels-partials-probs.json
, however I am not quite sure how to obtain the ground truth abstract user fix from this information. For example, if we consider:I understand
tok_chgs
isErr_Literals -> H Literals <++> InsertErr -> is
which refers to[105, 323]
, but it is not yet clear to me howtokns
are altered in the ground truth fix. Does the suffix after_ENDMARKER_
identify a unique abstract sequence fix?_NAME_
or whitespace is substituted, inserted or deleted in the abstract token sequence, this can introduce cosmetic changes to parts of the input which are lexically identical in the abstract token sequence. Is there a way to map the tokenwise edits back to the exact character subsequence in the concrete source code while preserving the original formatting?It is also possible I am mistaken or misunderstanding an important detail. If so, any clarification would be welcome. Thank you!
cc: @jin-guo @xujiesi