imranraad07 / BugReportQA

0 stars 0 forks source link

add edit data; create a combined dataset #53

Closed damevski closed 4 years ago

damevski commented 4 years ago

Combine the data we got for edited posts with the data based on the comments into a complete dataset.

aciborowska commented 4 years ago

Currently, join_dataset.py use all files in the given directory with name matching "github_data_20*.csv" regex to build dataset, so as long as csv with edit data is unzipped it will be added to the dataset.

One question: what do we want to do if a post has been answered in the comment and it also has edit data? I think we believe more strongly in edited data, so I vote for keeping <post, question, answer_edit> and rejecting <post, question, answer_comment>.

damevski commented 4 years ago

Answer: we will keep the edits if both are present