octokit / octokit.rb

Ruby toolkit for the GitHub API
http://octokit.github.io/octokit.rb/
MIT License
3.85k stars 1.14k forks source link

[FEAT]: Change `auto_paginate` to return an enumerator instead of getting all pages #1722

Open pbstriker38 opened 1 month ago

pbstriker38 commented 1 month ago

Describe the need

It would be nice if auto_paginate would allow enumeration so that you could stop the request once you have reached your desired goal. Calling to_a on the enumerator will give you the current auto_paginate behavior of just paging to the last page. This avoids wasted requests and memory bloat when potentially concat'ing 10's of thousands of objects with the current auto pagination.

Sample code of how it would work

# objects_path is for API's that return the resource nested in the response. Like workflow runs
# { total_count: 1193, workflow_runs: [....] }
# Would be called like so:
# paginate "#{Repository.path repo}/actions/runs", options, :workflow_runs
#
def paginate(url, options = {}, objects_path = nil)
  opts = parse_query_and_convenience_headers(options)
  if @auto_paginate || @per_page
    opts[:query][:per_page] ||= @per_page || (@auto_paginate ? 100 : nil)
  end

  if @auto_paginate
    enumerator(url, opts.dup, objects_path)
  else
    request(:get, url, opts.dup)
  end
end

def enumerator(url, options = {}, objects_path = nil)
  Enumerator.new do |yielder|
    response = request(:get, url, options, return_response: true)
    loop do
      iterable = objects_path ? response.data.dig(*objects_path) : response.data
      iterable.each do |object|
        yielder << object
      end

      break unless response.rels[:next]

      @last_response = response = response.rels[:next].get(headers: options[:headers])
    end
  end
end
irb(main):001> client = Octokit::Client.new(access_token: "ghs_...", auto_paginate: true)
irb(main):002> two_years_ago = Time.new(2022, 10, 18)

## This is showing multiple levels of auto pagination working
irb(main):003* client.pull_requests('octokit/octokit.rb', state: 'closed', per_page: 30).each do |pull_request|
irb(main):004*   break if pull_request.created_at < two_years_ago
irb(main):005*   next unless pull_request.merged_at && pull_request.labels.any? { |l| l.name == 'Type: Breaking change' }
irb(main):006*   
irb(main):007*   puts "#{pull_request.merged_at}: #{pull_request.title}"
irb(main):008*   client.issue_comments('octokit/octokit.rb', pull_request.number, per_page: 1).each { |comment| puts "  comment: #{comment.body[0..80]}" }
irb(main):009*   client.pull_request_comments('octokit/octokit.rb', pull_request.number, per_page: 1).each { |comment| puts "  comment: #{comment.body[0..80]}" }
irb(main):010*   puts "\n"
irb(main):011> end
2024-10-17 14:22:25 UTC: BREAKING CHANGE: Remove app methods that have been deprecated since v4.8.0
  comment: 👋 Hi! Thank you for this contribution! Just to let you know, our GitHub SDK team 

2024-03-25 17:52:02 UTC: BREAKING CHANGE: Removes PubSubHubbub module and /hub API endpoint implementation
  comment: 👋 Hi! Thank you for this contribution! Just to let you know, our GitHub SDK team 
  comment: Thanks for the contributions here @mjimenez98  ❤️ 

2023-07-19 16:38:32 UTC: [Breaking change] fix/update call to delete_reaction
  comment: 👍🏻 LGTM 
  comment: @nickfloyd just fixed the specs + moved the VCR cassette 👍🏻 
  comment: ```suggestion
      # @see https://docs.github.com/en/rest/reactions/reactions#d
  comment: ```suggestion
      def delete_issue_reaction(repo, issue_id, reaction_id, optio

2023-10-30 21:35:35 UTC: Use 'line' instead of 'position'
  comment: I've approved the Actions runs; thank you for submitting! I'm slightly torn about
  comment: Well...You're right! It may not be possible to provide an override because we're 
  comment: Hmm...I'm not loving any of these options, really. Perhaps the best thing to do i
  comment: I see, understood! Thanks!
We will put it on hold until the next major version. 
  comment: Would you consider adding an interim method which uses the new API request format
  comment: When we merge #1596, we can/should also take this PR before cutting a breaking ch

=> nil

## Only 4 requests will be made here since we are only taking 17 PR's
irb(main):012> client.pull_requests('octokit/octokit.rb', state: 'closed', per_page: 5).take(17).map(&:title)
=> 
["BREAKING CHANGE: Remove app methods that have been deprecated since v4.8.0",
 "V8 update main",
 "Add support for file comments in PRs",
 "Update rubocop requirement from 1.65.0 to 1.66.1",
 "fix: Remove the Faraday multipart warning and tweak the error message raised at runtime when Faraday Multipart is not installed",
 "Update rubocop requirement from 1.65.0 to 1.66.0",
 "Update rubocop requirement from 1.65.0 to 1.65.1",
 "fix: Addresses URI parsing compatibility with Ruby 3.4",
 "Update rubocop requirement from 1.64.1 to 1.65.0",
 "feat: enable and disable automated security features",
 "Management Console Client: Update deprecation version",
 "Bump actions/add-to-project from 1.0.1 to 1.0.2",
 "🚧 Workflows have changed",
 "Update rubocop requirement from 1.64.0 to 1.64.1",
 "feat: Add Ruby 3.3 as supported one.",
 "feat: adds support for Code scanning endpoints",
 "Rename codeql-analysis.yml to codeql-analysis.yml"]

irb(main):013> client.pull_requests('octokit/octokit.rb', state: 'all')
=> #<Enumerator: ...>    <----- No requests have been made yet

I'm not sure what to do about methods like this one that are returning the total count. This total_count is actually wrong when auto paginating since the total_count already takes into account all of the pages. So maybe that should be removed since it is giving the wrong value anyways.

def check_runs_for_ref(repo, ref, options = {})
  paginate "#{Repository.path repo}/commits/#{ref}/check-runs", options do |data, last_response|
    data.check_runs.concat last_response.data.check_runs
    data.total_count += last_response.data.total_count
  end
end

swdotcom(dev)> client.auto_paginate = false
=> false
swdotcom(dev)> client.check_runs_for_ref('octokit/octokit.rb', '482676432a49085368c780408d7899108dc656ec', per_page: 3).total_count
=> 13
swdotcom(dev)> client.auto_paginate = true
=> true
swdotcom(dev)> client.check_runs_for_ref('octokit/octokit.rb', '482676432a49085368c780408d7899108dc656ec', per_page: 3).total_count
=> 65

SDK Version

No response

API Version

No response

Relevant log output

No response

Code of Conduct

github-actions[bot] commented 1 month ago

👋 Hi! Thank you for this contribution! Just to let you know, our GitHub SDK team does a round of issue and PR reviews twice a week, every Monday and Friday! We have a process in place for prioritizing and responding to your input. Because you are a part of this community please feel free to comment, add to, or pick up any issues/PRs that are labeled with Status: Up for grabs. You & others like you are the reason all of this works! So thank you & happy coding! 🚀