Closed mashalahmad closed 6 years ago
could you please tell what does Copies attribute means in PostRefernceGH table?
Sure, Copies
indicates how often that exact file appears in the dataset. For a certain FileId
, it is equal to:
SELECT COUNT(*)
FROM `sotorrent-org.2018_09_23.PostReferenceGH`
WHERE FileId="<FILE_ID>";
and how can I get to know that a class of GitHub has many clones from stack overflow? I read you paper Attribution Required: Stack Overflow Code Snippets in GitHub Projects where you use CPD to detect the clones. is it suitable? or is there anyway to get it from PostRefernceGH table.
Unfortunately, I can only provide support for the dataset here. It's up to you to find a suitable approach to detect the code clones. You could use CPD, but most likely only on a sample of projects and snippets. However, there are many other code clone detectors available. You could start with these papers:
The corresponding full paper for the ICSE extended abstract you mentioned is now also available:
hey could you please tell what does Copies attribute means in PostRefernceGH table?
and how can I get to know that a class of GitHub has many clones from stack overflow?
I read you paper Attribution Required: Stack Overflow Code Snippets in GitHub Projects where you use CPD to detect the clones. is it suitable? or is there anyway to get it from PostRefernceGH table.