Enable Kaiaulu to reuse our Motif TSE Dataset

Now we can reuse the motif analysis #210 from TSE, it would be helpful to also reuse its dataset: https://cdn.lfdr.de/stmc/#/projects.

Both Git Log and Mailing Lists are possible since they use the standard .git and .mbox files. The only limitation is the JIRA dataset, which is in XML (and I believe derived from a Python crawler I implemented many years ago).

Writing an XML parser for the JIRA data would, therefore, give us a pool of 26 projects for other studies to extend the conclusions of the prior work and reproducibility to an extent.

sailuh / kaiaulu

Enable Kaiaulu to reuse our Motif TSE Dataset #218