sotorrent / db-scripts

SQL and Bash scripts to import the offical Stack Overflow data dump and the SOTorrent data set, to retrieve Stack Overflow references from the BigQuery GitHub data set, and to retrieve data from the SOTorrent dataset for analysis.
Apache License 2.0
14 stars 7 forks source link

Separating repository name and repository owner name #3

Closed JafarAkhondali closed 6 years ago

JafarAkhondali commented 6 years ago

I think it would be nice to separate repository name and repository owner name into two columns in PostReferenceGH table

sbaltes commented 6 years ago

I implemented this for the new release:

2018-10-28 07_36_44-google bigquery

A also kept the merged repository name, because people may already rely on that column in their scripts. I may drop it after the MSR mining challenge is over.

Until the new release is out, you can use this snippet to split column RepoName:

#standardSQL
SELECT
  RepoArray[OFFSET(0)] AS RepoOwner,
  RepoArray[OFFSET(1)] AS RepoName
FROM (
  SELECT SPLIT(RepoName, "/") as RepoArray
  FROM `sotorrent-org.gh_so_references_2018_09_23.PostReferenceGH`
);