neovim / doc

Generated documentation and reports
neovim.github.io/doc/
Apache License 2.0
34 stars 13 forks source link

backup GitHub issues, comments, discussions, etc. #16

Closed justinmk closed 1 year ago

justinmk commented 7 years ago

Potential tools:

  1. https://github-backup.branchable.com/ doesn't look maintained
  2. βœ… https://github.com/josegonzalez/python-github-backup good results πŸ‘

Requirements

See https://github.com/neovim/doc/issues/16#issuecomment-1636881895

sarkararpan710 commented 5 years ago

I would like to know how exactly I would be able to solve this issue. Guidance would be appreciated as I am new to this.

justinmk commented 5 years ago

The post above links to a potential tool that could be used. The task is to investigate how to use that tool (or some alternative), and then writing a script that uses it.

tsukinoko-kun commented 1 year ago

@justinmk where should this be stored? I would suggest a GitHub Release.

justinmk commented 1 year ago

@Frank-Mayer I would expect the data to be stored in a git repo.

tsukinoko-kun commented 1 year ago

@justinmk one backup-repo to back up multiple other repositories? Or one backup branch in one repository?

The first option could be useful if there are multiple different repos to back up.

tsukinoko-kun commented 1 year ago

I would prefer the first option.

One repository neovim/backup with one directory for each neovim repository that should get a backup.

Using a GitHub action triggered on a cron shedule (maybe each week) these backups can be updated.

I created a test repository for this approach: https://github.com/Frank-Mayer/backup

justinmk commented 1 year ago

Nice! Looks good to me. Don't want separate branches, only separate directories as you did.

I think these are requirements:

  1. The repos should be explicitly chosen, i.e. we don't want to implicitly backup all repos. (Edit: a way to exclude noisy repos would be useful.)
  2. The cron job should be very friendly to github's API
    • Only incremental changes should be pulled.
  3. Ignore anything newer than 1 week (1 month?). We want to avoid storing "edit history".
    • Only pull the "latest" version of a comment, not its history (assuming github API even offers that)
  4. Repo size should be not too big, hopefully much less than 1 GB.
    • Don't store images/videos.
    • Other ideas?
tsukinoko-kun commented 1 year ago
  1. The repos should be explicitly chosen, i.e. we don't want to implicitly backup all repos. (where is that specified, I don't see it in your gha job?)

I currently back up all repositories of the neovim organisation. I could provide a list, this is not a problem. I would suggest neovim, go-client, node-client, nvim-lspconfig, pynvim, nvim.net.

  1. The cron job should be very friendly to github's API

    • Only incremental changes should be pulled.

Then I will set the cron to once a month. Incremental changes are active.

  1. Ignore anything newer than 1 week (1 month?). We want to avoid storing "edit history".

    • Only pull the "latest" version of a comment, not its history (assuming github API even offers that)

I don't think "ignore anything newer than 1 week (1 month?)" is possible with this. But I am looking into the tools and hopefully find a way to do this.

  1. Repo size should be not too big, hopefully much less than 1 GB.

    • Don't store images/videos.
    • Other ideas?

My test repository currently takes 7.8 MB.

justinmk commented 1 year ago

I could provide a list, this is not a problem. I would suggest ...

After thinking more, maybe an explicit list isn't needed. Because the data for most repos will be very small, they don't have many issues/PRs. But if there's a way to exclude a repo that may be needed. E.g. https://github.com/neovim/winget-pkgs is something we wouldn't want to backup, although it doesn't use PRs so even its data is small.

I don't think "ignore anything newer than 1 week (1 month?)" is possible with this

Could be a TODO. I would guess we could add it as a feature to https://github.com/josegonzalez/python-github-backup , or worst case, we could parse the JSON and filter our stuff manually before git-committing it.

My test repository currently takes 7.8 MB.

Extrapolating to 10k issues, I'm guessing the full data will approach 1+ GB. This is not a blocker, but as a TODO we could think about ignoring some kinds of PRs and issues. E.g. vim-patch PRs could possibly be dropped.

tsukinoko-kun commented 1 year ago

I made the suggested changes as far as I am able to do so with the given tools.

I would suggest transferring Frank-Mayer/backup to the Neovim organization.

With the current possibilities of github-backup I don't see a possibility of excluding vim-patch PRs. I would add this as a TODO. This could be added by a PR or fork to github-backup.

Maybe you know this @justinmk: I am uncertain whether secrets get transferred with the repository or not. A secret called PAT is expected. This is required to call the GitHub API.

justinmk commented 1 year ago

Ok, thanks! Will look for a transfer request. Let's see how it goes.

tsukinoko-kun commented 1 year ago

Well, the plan doesn't seem to work. πŸ˜…

Screenshot 2023-07-24 at 21 18 03
tsukinoko-kun commented 1 year ago

I don’t have the permission to transfer the repository @justinmk

I could transfer it to you, and you add it to neovim. Or you fork it to neovim.

(As far as I know, if you fork it, you have to enable GitHub Actions.)

justinmk commented 1 year ago

Try transferring it to me

Edit: it's here now: https://github.com/neovim/neovim-backup

justinmk commented 1 year ago

Thanks again for getting this started! Can continue iterating at https://github.com/neovim/neovim-backup