gousiosg / github-mirror

Scripts to mirror Github in a cloudy fashion
BSD 2-Clause "Simplified" License
559 stars 106 forks source link

Non-commit entities not stored in MySQL database #62

Open gordongli opened 6 years ago

gordongli commented 6 years ago

When running ght-retrieve-repo, while commits are successfully stored in the database, issues, pull_requests, etc. are fetched but not stored, even when providing the -y option. I notice in the logs that while ghtorrent.rb is being used to add commits to the database when retrieving them, this is not the case with the other entities.

commits:

...
INFO, 2018-02-27T16:13:45-08:00, ghtorrent -- api_client.rb: Successful request. URL: https://api.github.com/repos/twitter/twemoji/commits/72b5e44e092d910629547cbc6886127901fb81d8?per_page=100, Remaining: 3098, Total: 278 ms
INFO, 2018-02-27T16:13:45-08:00, ghtorrent -- retriever.rb: Added commit twitter/twemoji -> 72b5e44e092d910629547cbc6886127901fb81d8
INFO, 2018-02-27T16:13:45-08:00, ghtorrent -- ghtorrent.rb: Added commit twitter/twemoji -> 72b5e44e092d910629547cbc6886127901fb81d8 
INFO, 2018-02-27T16:13:45-08:00, ghtorrent -- ghtorrent.rb: Added commit_parent 72b5e44e092d910629547cbc6886127901fb81d8 to commit 2d8c1a7e7243c76aa53db8f018dcbdb994d22024
...

pull requests:

...
INFO, 2018-02-27T16:09:54-08:00, ghtorrent -- api_client.rb: Successful request. URL: https://api.github.com/repos/twitter/twemoji/pulls/225, Remaining: 3284, Total: 733 ms
INFO, 2018-02-27T16:09:54-08:00, ghtorrent -- retriever.rb: Added pull_requests twitter/twemoji -> 225
INFO, 2018-02-27T16:09:55-08:00, ghtorrent -- api_client.rb: Successful request. URL: https://api.github.com/repos/twitter/twemoji/pulls/219, Remaining: 3283, Total: 870 ms
INFO, 2018-02-27T16:09:55-08:00, ghtorrent -- retriever.rb: Added pull_requests twitter/twemoji -> 219
...
gousiosg commented 6 years ago

I cannot replicate this. With MongoDB and SQLite configured, I run:

ruby -Ilib bin/ght-retrieve-repo -t token gousiosg github-mirror

I get the same output as your for commits

INFO, 2018-02-28T10:09:34+01:00, ghtorrent -- retriever.rb: Added commit gousiosg/github-mirror -> 2e5c6db4c5a5d39ba59d3101cad4051cda43fb02
INFO, 2018-02-28T10:09:34+01:00, ghtorrent -- api_client.rb: Successful request. URL: https://api.github.com/repos/gousiosg/github-mirror/commits/4704dfec4283d9c3721709a1e69fb6d1dc5c81d6?per_page=100, Remaining: 1562, Total: 487 ms
INFO, 2018-02-28T10:09:34+01:00, ghtorrent -- retriever.rb: Added commit gousiosg/github-mirror -> 4704dfec4283d9c3721709a1e69fb6d1dc5c81d6
INFO, 2018-02-28T10:09:35+01:00, ghtorrent -- ghtorrent.rb: Added commit gousiosg/github-mirror -> 7428d94cf62a0658b8c357750fa7e302ce709930 

but different output for pull requests:

INFO, 2018-02-28T10:09:54+01:00, ghtorrent -- full_repo_retriever.rb: Stage: ensure_languages completed, Repo: gousiosg/github-mirror, Time: 455 ms
INFO, 2018-02-28T10:09:55+01:00, ghtorrent -- api_client.rb: Successful request. URL: https://api.github.com/repos/gousiosg/github-mirror/pulls?per_page=100, Remaining: 4981, Total: 599 ms
INFO, 2018-02-28T10:09:55+01:00, ghtorrent -- api_client.rb: Successful request. URL: https://api.github.com/repos/gousiosg/github-mirror/pulls?page=1&per_page=100, Remaining: 4980, Total: 492 ms
[...]
INFO, 2018-02-28T10:10:12+01:00, ghtorrent -- ghtorrent.rb: Added user Zearin
INFO, 2018-02-28T10:10:12+01:00, ghtorrent -- ghtorrent.rb: Added pull_req 1 (head deleted) -> gousiosg/github-mirror
INFO, 2018-02-28T10:10:12+01:00, ghtorrent -- ghtorrent.rb: Added pullreq_event (2) -> (opened) by (Zearin) timestamp 2013-11-24 16:56:19 UTC
INFO, 2018-02-28T10:10:12+01:00, ghtorrent -- ghtorrent.rb: Added pullreq_event (2) -> (merged) by (gousiosg) timestamp 2013-11-25 13:42:03 UTC
INFO, 2018-02-28T10:10:12+01:00, ghtorrent -- ghtorrent.rb: Added pullreq_event (2) -> (closed) by (gousiosg) timestamp 2013-11-25 13:42:03 UTC
INFO, 2018-02-28T10:10:12+01:00, ghtorrent -- api_client.rb: Successful request. URL: https://api.github.com/repos/gousiosg/github-mirror/pulls/1/commits?per_page=100, Remaining: 4942, Total: 523 ms
INFO, 2018-02-28T10:10:12+01:00, ghtorrent -- ghtorrent.rb: Added pullreq_commit e3933e58a614bd5487303260e9d1c39abb2e8c09 to gousiosg/github-mirror -> 1
gordongli commented 6 years ago

Setting up MongoDB and using it as the persister fixed the issue. Would this be a bug or something to document?