maxlath / backup-github-repo

Backup all the issues and pull requests of a Github repo, including the comments, events, and labels, as JSON and as HTML
40 stars 7 forks source link

No such file or directory, open './issues/data.json' #2

Closed dustinkerstein closed 3 years ago

dustinkerstein commented 5 years ago

Any idea what could be causing this error? FYI, I did have to install jsondepth globally in order for the jd command to work.

fs.js:125
    throw err;
    ^

Error: ENOENT: no such file or directory, open './issues/data.json'
    at Object.openSync (fs.js:454:3)
    at Object.readFileSync (fs.js:354:35)
    at Object.<anonymous> (/Users/dustin/.nvm/versions/node/v12.1.0/lib/node_modules/jsondepth/lib/parse_arguments.js:10:26)
    at Module._compile (internal/modules/cjs/loader.js:759:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:770:10)
    at Module.load (internal/modules/cjs/loader.js:628:32)
    at Function.Module._load (internal/modules/cjs/loader.js:555:12)
    at Module.require (internal/modules/cjs/loader.js:666:19)
    at require (internal/modules/cjs/helpers.js:16:16)
    at Object.<anonymous> (/Users/dustin/.nvm/versions/node/v12.1.0/lib/node_modules/jsondepth/bin/jsondepth:2:40)
maxlath commented 5 years ago

could it be that you didn't install the dependencies? I forgot to mention it in the installation steps

also, errors during ./scripts/download_json were not making the script exit with a non-zero code (fixed in 8b7f63d), which was making ./scripts/download_html fail to find ./issues/data.json, so if you run the latest version, you might get a new error that might be more informative

dustinkerstein commented 5 years ago

I ran npm install and re-ran npm link and am still getting that same error on the latest commit. Any other ideas on what could be happening, or maybe where I could add some debug to isolate?

dustinkerstein commented 5 years ago

Here's a more complete log:

Get a new token at https://github.com/settings/tokens
No special permission required
using a token increases the number of API requests we can make, see https://developer.github.com/v3/#rate-limiting
repo: XYZ/ABC
fs.js:125
    throw err;
    ^

Error: ENOENT: no such file or directory, open './issues/data.json'
    at Object.openSync (fs.js:454:3)
    at Object.readFileSync (fs.js:354:35)
    at Object.<anonymous> (/Users/dustin/.nvm/versions/node/v12.1.0/lib/node_modules/jsondepth/lib/parse_arguments.js:10:26)
    at Module._compile (internal/modules/cjs/loader.js:759:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:770:10)
    at Module.load (internal/modules/cjs/loader.js:628:32)
    at Function.Module._load (internal/modules/cjs/loader.js:555:12)
    at Module.require (internal/modules/cjs/loader.js:666:19)
    at require (internal/modules/cjs/helpers.js:16:16)
    at Object.<anonymous> (/Users/dustin/.nvm/versions/node/v12.1.0/lib/node_modules/jsondepth/bin/jsondepth:2:40)

Do I need to get a new token for this to work?

maxlath commented 5 years ago

It looks like your token can't be found in the config. It should look something like this:

// /path/to/backup-github-issues/config/local.js
module.exports = { token: '9a6055acdb6dc2e37a64fc1c4c2bc98c4bcdd58a' }
dustinkerstein commented 5 years ago

Ah, I missed that. I somehow thought it would digest the current repo authentication. Sorry about that. Which permissions does the script need?

image

maxlath commented 5 years ago

none, it's just used to be able to increase the quota of requests :)

dustinkerstein commented 5 years ago

K. With the token (with no permissions) I get this now:

issues data path: /HIDDEN/issues/data.json
repo: HIDDEN/HIDDEN

[download_json] failed to getLocalIssuesOrFetch Error: Not Found
    at handleHttpErrorCode (/Users/dustin/GitHub/backup-github-issues/node_modules/bluereq/lib/promisified_request.js:28:15)
    at Request._callback (/Users/dustin/GitHub/backup-github-issues/node_modules/bluereq/lib/promisified_request.js:21:9)
    at Request.self.callback (/Users/dustin/GitHub/backup-github-issues/node_modules/request/request.js:185:22)
    at Request.emit (events.js:196:13)
    at Request.<anonymous> (/Users/dustin/GitHub/backup-github-issues/node_modules/request/request.js:1161:10)
    at Request.emit (events.js:196:13)
    at Gunzip.<anonymous> (/Users/dustin/GitHub/backup-github-issues/node_modules/request/request.js:1083:12)
    at Object.onceWrapper (events.js:284:20)
    at Gunzip.emit (events.js:201:15)
    at endReadableNT (_stream_readable.js:1130:12)
    at processTicksAndRejections (internal/process/task_queues.js:84:17) {
  statusCode: 404,
  statusMessage: 'Not Found',
  headers: {
    server: 'GitHub.com',
    date: 'Thu, 02 May 2019 18:20:12 GMT',
    'content-type': 'application/json; charset=utf-8',
    'transfer-encoding': 'chunked',
    connection: 'close',
    status: '404 Not Found',
    'x-ratelimit-limit': '5000',
    'x-ratelimit-remaining': '4998',
    'x-ratelimit-reset': '1556822344',
    'x-oauth-scopes': '',
    'x-accepted-oauth-scopes': 'repo',
    'x-github-media-type': 'github.v3',
    'access-control-expose-headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, ' +
      'X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, ' +
      'X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, ' +
      'X-GitHub-Media-Type',
    'access-control-allow-origin': '*',
    'strict-transport-security': 'max-age=31536000; includeSubdomains; preload',
    'x-frame-options': 'deny',
    'x-content-type-options': 'nosniff',
    'x-xss-protection': '1; mode=block',
    'referrer-policy': 'origin-when-cross-origin, strict-origin-when-cross-origin',
    'content-security-policy': "default-src 'none'",
    'content-encoding': 'gzip',
    'x-github-request-id': '9003:632C:6A9A5B:E9C3E0:5CCB34DC'
  },
  body: {
    message: 'Not Found',
    documentation_url: 'https://developer.github.com/v3/issues/#list-issues-for-a-repository'
  },
  url: 'https://api.github.com/repos/HIDDEN/HIDDEN/issues?per_page=100&state=all&page=1'
}

Am I missing another part of the config?

maxlath commented 5 years ago

is the repo name correct in the url https://api.github.com/repos/HIDDEN/HIDDEN/issues?per_page=100&state=all&page=1? there might have been a problem with the parsing of the repo name? or maybe your repo isn't public and something is different than for public repos?

dustinkerstein commented 5 years ago

The repo I'm trying to backup is private. Is that okay?

It's reported correctly here:

issues data path: /Users/dustin/GitHub/HIDDEN/issues/data.json
repo: HIDDEN/HIDDEN

But is incorrect here:

body: {
    message: 'Not Found',
    documentation_url: 'https://developer.github.com/v3/issues/#list-issues-for-a-repository'
  },
  url: 'https://api.github.com/repos/HIDDEN/HIDDEN%0A/issues?per_page=100&state=all&page=1'

Ie. The url has %0A appended to the end of the correct repo name.

maxlath commented 5 years ago

could you post an anonymized output of the command git remote -v? it might be a parsing error from this line

dustinkerstein commented 5 years ago

Sure thing:

origin  https://github.com/HIDDEN/HIDDEN.git (fetch)
origin  https://github.com/HIDDEN/HIDDEN.git (push)
dustinkerstein commented 5 years ago

Seems to be something else. I get this same error when trying to export the backup-github-issues repo:

issues data path: /Users/dustin/GitHub/backup-github-issues/issues/data.json
repo: maxlath/backup-github-issues

[download_json] failed to getLocalIssuesOrFetch Error: Not Found
    at handleHttpErrorCode (/Users/dustin/GitHub/backup-github-issues/node_modules/bluereq/lib/promisified_request.js:28:15)
    at Request._callback (/Users/dustin/GitHub/backup-github-issues/node_modules/bluereq/lib/promisified_request.js:21:9)
    at Request.self.callback (/Users/dustin/GitHub/backup-github-issues/node_modules/request/request.js:185:22)
    at Request.emit (events.js:196:13)
    at Request.<anonymous> (/Users/dustin/GitHub/backup-github-issues/node_modules/request/request.js:1161:10)
    at Request.emit (events.js:196:13)
    at Gunzip.<anonymous> (/Users/dustin/GitHub/backup-github-issues/node_modules/request/request.js:1083:12)
    at Object.onceWrapper (events.js:284:20)
    at Gunzip.emit (events.js:201:15)
    at endReadableNT (_stream_readable.js:1130:12)
    at processTicksAndRejections (internal/process/task_queues.js:84:17) {
  statusCode: 404,
  statusMessage: 'Not Found',
  headers: {
    server: 'GitHub.com',
    date: 'Thu, 02 May 2019 18:39:26 GMT',
    'content-type': 'application/json; charset=utf-8',
    'transfer-encoding': 'chunked',
    connection: 'close',
    status: '404 Not Found',
    'x-ratelimit-limit': '5000',
    'x-ratelimit-remaining': '4999',
    'x-ratelimit-reset': '1556825966',
    'x-oauth-scopes': '',
    'x-accepted-oauth-scopes': 'repo',
    'x-github-media-type': 'github.v3',
    'access-control-expose-headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, ' +
      'X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, ' +
      'X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, ' +
      'X-GitHub-Media-Type',
    'access-control-allow-origin': '*',
    'strict-transport-security': 'max-age=31536000; includeSubdomains; preload',
    'x-frame-options': 'deny',
    'x-content-type-options': 'nosniff',
    'x-xss-protection': '1; mode=block',
    'referrer-policy': 'origin-when-cross-origin, strict-origin-when-cross-origin',
    'content-security-policy': "default-src 'none'",
    'content-encoding': 'gzip',
    'x-github-request-id': '8642:08FB:413222:9FF7AC:5CCB395E'
  },
  body: {
    message: 'Not Found',
    documentation_url: 'https://developer.github.com/v3/issues/#list-issues-for-a-repository'
  },
  url: 'https://api.github.com/repos/maxlath/backup-github-issues%0A/issues?per_page=100&state=all&page=1'
}
dustinkerstein commented 5 years ago

Ah, it doe seem like it's the %0A line. If I go to https://api.github.com/repos/maxlath/backup-github-issues/issues?per_page=100&state=all&page=1 in a browser, I see results.

But if I go to https://api.github.com/repos/maxlath/backup-github-issues%0A/issues?per_page=100&state=all&page=1

I get:

{
  "message": "Not Found",
  "documentation_url": "https://developer.github.com/v3/issues/#list-issues-for-a-repository"
}
dustinkerstein commented 5 years ago

And to be clearer, my repo name looks like this: LowerCaseUpperCase-Unity-SDK - Maybe it's the dashes...

maxlath commented 5 years ago

e041302 might solve this issue, trying to use a consistent repository name parser, can you give it a try?

dustinkerstein commented 5 years ago

That fixes the issue for public repos (ie. your backup-github-repo) and exports all of the issues successfully, but there is a new issue with private repos. After adding additional permissions to the token (not sure which one yet specifically - but adding all permissions does fix it) to get past the 404, it appears to work correctly:

data path: /Users/dustin/GitHub/HIDDENREPO/repo-backup/data.json
repo: HIDDENAUTHOR/HIDDENREPO
entry: 13
adding comments, events, and labels: 13/13
done
indexing issue by number
Done downloading JSON
repo: HIDDENAUTHOR/HIDDENREPO
entries: 13
Download issues and pull requests HTML sequentially
[1/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/1 --> 1.html
[2/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/2 --> 2.html
[3/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/3 --> 3.html
[4/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/4 --> 4.html
[5/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/5 --> 5.html
[6/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/6 --> 6.html
[7/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/7 --> 7.html
[8/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/8 --> 8.html
[9/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/9 --> 9.html
[10/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/10 --> 10.html
[11/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/11 --> 11.html
[12/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/12 --> 12.html
[13/13]: https://github.com/HIDDENAUTHOR/HIDDENREPO/issues/13 --> 13.html
Done downloading issues and pull requests HTML sequentially
Padding filenames with zeros
sed: 1: "0001.html": invalid command code .

But then actually opening the html files shows Not Found:

image

FYI, the sed error/warning shows even on successful exports of public repos and doesn't appear to have any negative effect:

Done downloading issues and pull requests HTML sequentially
Padding filenames with zeros
sed: 1: "0001.html": invalid command code .
dustinkerstein commented 5 years ago

It looks like the curl to actually get the issues isn't using the auth token in the header:

curl -L "https://github.com/${repo}/issues/[1-${last_id}]" -o "#1.html" --limit-rate 10M 2>&1 | grep -E '^\['

maxlath commented 5 years ago
dustinkerstein commented 5 years ago

It looks like this is required:

image

But I am still getting the same Not Found results in the html files.

I tried hacking in the other token curl approach since I'm not using an organization:

authorized_curl(){
  curl -u 'username:${token}' $@
}

But that also didn't work.

And yep, I am using OSX.

dustinkerstein commented 5 years ago

I just invited you to a test private repo - https://github.com/dustinkerstein/private-repo-test - Let me know if you're not able to replicate there.

maxlath commented 5 years ago

was able to replace the issue from the private repo, indeed you can't just get issues html from the usual address

curl -H 'Authorization: token ${token}' https://github.com/dustinkerstein/private-repo-test/issues/1

I can't think of another way to get that html, but if you do find a way, feel welcome to open a PR. meanwhile, I will update the readme to precise that issues HTML download isn't supported for private repos

dustinkerstein commented 5 years ago

These both appear to work:

curl -H 'Authorization: token TOKEN' https://api.github.com/repos/dustinkerstein/private-repo-test/issues/1
curl -u username:TOKEN https://api.github.com/repos/dustinkerstein/private-repo-test/issues/1

I think you just needed the api in the url.

dustinkerstein commented 5 years ago

Sorry, it also needs repos in the url like this:

authorized_curl -L "https://api.github.com/repos/${repo}/issues/[1-${last_id}]" -o "#1.html" 2>&1 | grep -E '^\['

Then I also for some reason had to hardcode the token as I it was still complaining about credentials. After that, I was able to get it to output on my desired repo, but for some reason I was still seeing issues pulling from the new private test repo.

Further, the output is now in a different format so it's not parsing correctly. Are you able to get those commands above working?

dustinkerstein commented 5 years ago

You should be able to see this working in the browser here with this url - https://api.github.com/repos/dustinkerstein/private-repo-test/issues/1?access_token=TOKEN

maxlath commented 5 years ago

sorry, I won't work further on supporting private repos - this tool is low on my priority list and my use-case is covered - but can take time to review and merge PRs

dustinkerstein commented 5 years ago

Ok. Thanks. It does look like this json API approach would be a bit more work to parse.