rundel / ghclass

Tools for managing classroom organizations
https://rundel.github.io/ghclass/
GNU General Public License v3.0
142 stars 22 forks source link

Using GitHub trees API to move files/folders without commit history #49

Open thereseanders opened 5 years ago

thereseanders commented 5 years ago

For the peer review functions, I am trying to use the GitHub Trees API to move files/folders from one repo to another, without the commit history. I am following a similar procedure as outlined here, with the following GitHub API endpoint:

POST /repos/:owner/:repo/git/trees

I am having trouble specifying the payload to create a new tree (in either the same repo hw9-tnaanders or a different repo, e.g. demo-tree). What is the parameter to use within the gh() function? @rundel could you point me to the correct way to do this?

This is the R code I am using below:

res = gh::gh("GET /repos/:owner/:repo/branches/:branch",
       owner = "ghclass-test",
       repo = "hw9-tnaanders",
       branch = "master",
       .token = get_github_token())
tree_sha = res$commit$commit$tree$sha

res2 = gh::gh("GET /repos/:owner/:repo/git/trees/:sha",
       owner = "ghclass-test",
       repo = "hw9-tnaanders",
       sha = tree_sha,
       .token = get_github_token())

# The object below is a folder on the "hw9-tnaanders" repo
# Same thing happens if a file name is specified instead
purrr::map(res2$tree, "path")
fname = "toreview1"
obj = res2$tree[[which(purrr::map(res2$tree, "path") == fname)]]

##### This results in 422 error #####
treer = list(base_tree = tree_sha,
             tree = list(list(path = obj$path,
                              mode = obj$mode,
                              type = obj$type,
                              sha = obj$sha)))

gh::gh("POST /repos/:owner/:repo/git/trees",
       owner = "ghclass-test",
       repo = "hw9-tnaanders",
       .token = get_github_token(),
       body = jsonlite::toJSON(treer, auto_unbox = TRUE))

##### Sending base_tree and tree separate returns 422 error as well #####
base_tree = list(base_tree = tree_sha)
tree = list(list(path = obj$path,
                 mode = obj$mode,
                 type = obj$type,
                 sha = obj$sha))

gh::gh("POST /repos/:owner/:repo/git/trees",
       owner = "ghclass-test",
       repo = "hw9-tnaanders",
       .token = get_github_token(),
       base_tree = jsonlite::toJSON(base_tree, auto_unbox = TRUE),
       tree = jsonlite::toJSON(tree, auto_unbox = TRUE))
rundel commented 5 years ago

I've purposely avoided this (previously) because it is difficult - it is definitely worth while but I'll have to sit down and poke at it a bit to see how to get it all working.

thereseanders commented 5 years ago

Ok, thank you! I will continue to look into this as well.

thereseanders commented 5 years ago

Here is a reproducible example of the issue for a public repo.

res = gh::gh("GET /repos/:owner/:repo/branches/:branch",
             owner = "thereseanders",
             repo = "workshop-dataviz-fsu",
             branch = "master",
             .token = usethis::github_token())
tree_sha = res$commit$commit$tree$sha

res2 = gh::gh("GET /repos/:owner/:repo/git/trees/:sha",
              owner = "thereseanders",
              repo = "workshop-dataviz-fsu",
              sha = tree_sha,
              .token = usethis::github_token())

# The object below is a folder on the "workshop-dataviz-fsu" repo
# Same thing happens if a file name is specified instead
purrr::map(res2$tree, "path")
#> [[1]]
#> [1] "Day1"
#> 
#> [[2]]
#> [1] "Day2"
#> 
#> [[3]]
#> [1] "Day3"
#> 
#> [[4]]
#> [1] "LICENSE.md"
#> 
#> [[5]]
#> [1] "README.md"
fname = "Day1"
obj = res2$tree[[which(purrr::map(res2$tree, "path") == fname)]]

##### This results in 422 error #####
treer = list(base_tree = tree_sha,
             tree = list(list(path = obj$path,
                              mode = obj$mode,
                              type = obj$type,
                              sha = obj$sha)))

gh::gh("POST /repos/:owner/:repo/git/trees",
       owner = "thereseanders",
       repo = "workshop-dataviz-fsu",
       .token = usethis::github_token(),
       body = jsonlite::toJSON(treer, auto_unbox = TRUE))
#> Error in gh::gh("POST /repos/:owner/:repo/git/trees", owner = "thereseanders", : GitHub API error (422): 422 Unprocessable Entity
#>   Invalid tree info

##### Sending base_tree and tree separate returns 422 error as well #####
base_tree = list(base_tree = tree_sha)
tree = list(list(path = obj$path,
                 mode = obj$mode,
                 type = obj$type,
                 sha = obj$sha))

gh::gh("POST /repos/:owner/:repo/git/trees",
       owner = "thereseanders",
       repo = "workshop-dataviz-fsu",
       .token = usethis::github_token(),
       base_tree = jsonlite::toJSON(base_tree, auto_unbox = TRUE),
       tree = jsonlite::toJSON(tree, auto_unbox = TRUE))
#> Error in gh::gh("POST /repos/:owner/:repo/git/trees", owner = "thereseanders", : GitHub API error (422): 422 Unprocessable Entity
#>   Invalid tree info

Created on 2019-07-08 by the reprex package (v0.3.0)

jimhester commented 5 years ago

Here is an example of taking all the files in the R file of xml2 and putting them in a new repo

#retrieve tree
tree <- gh::gh("GET /repos/:owner/:repo/git/trees/:tree_sha", owner = "jimhester", repo = "xml2", tree_sha = "a17105678325d250b40700fa82f1682532ed7b71")

library(purrr)
tree2 <- map(tree$tree, function(x) {
  content <- gh::gh("GET /repos/:owner/:repo/git/blobs/:file_sha", owner = "jimhester", repo = "xml2", file_sha = .x$sha)

  # copy the blob to the new repo
  gh::gh("POST /repos/:owner/:repo/git/blobs", owner = "jimhester", repo = "testAPI", content = content$content, encoding = content$encoding)

  list(
    path = x$path,
    mode = x$mode,
    type = x$type,
    sha = .x$sha
  )
})

# move tree to new repo
new_tree <- gh::gh("POST /repos/:owner/:repo/git/trees", owner = "jimhester", repo = "testAPI", tree = tree2)

# create the commit
new_commit <- gh::gh("POST /repos/:owner/:repo/git/commits", owner = "jimhester", repo = "testAPI", message = "a new tree", tree = new_tree$sha, parents = list("596dd6b8a5dec5bee395c3ee495f9a5a926acce0"))

# move the master reference to the new SHA
gh::gh("PATCH /repos/:owner/:repo/git/:ref", owner = "jimhester", repo = "testAPI", ref = "refs/heads/master", sha = "dface8cdfe89b0617933b78b21324421aee7f083", force = TRUE)

The main thing different from what you were attempting was gh automatically handles the encoding to JSON, so you pass lists directly to it as function arguments. The other thing is I explicitly posted the blobs for each file in the tree separately before trying to create the tree.

Even after the tree is created you still have to create a commit that points to it, then move the appropriate reference to that commit.

The main thing missing from this example is this currently dumps the files into the root directory rather than a folder, that would require a tweak to the above.


After going through this I am not sure using the API is worth the hassle, since you still have to download the data to your machine, then upload it into the new repo. It is probably more straightforward to clone the repos locally and copy them as regular files.

thereseanders commented 5 years ago

Thank you so much, Jim! This is super helpful.

I believe I also missed the GET and POST steps on the blob before trying to POST the tree to the new location. Also, your point about having to download is well taken. I will still give it a try a) to learn and b) to see whether trees might be a faster way to move files - the current version is a bit slow. Thank you.