jekyll / github-metadata

Jekyll plugin to propagate the `site.github` namespace and set default values for use with GitHub Pages.
https://jekyll.github.io/github-metadata/
MIT License
282 stars 127 forks source link

API calls for each and every repository? #237

Closed egor-tensin closed 2 years ago

egor-tensin commented 2 years ago

Hey, I'm trying to migrate my stuff from GitHub Pages to an external hosting. Part of that process is getting rid of the github-pages gem and replacing it with a newer Jekyll version. I also use github-metadata. But, when I did the upgrade (in commit https://github.com/egor-tensin/egor-tensin.github.io/commit/d231c9b71d31b4208ef679106ba38a6dbf91ef61), github-metadata started making API calls for each and every repository that I have (this is just an excerpt):

 Incremental build: disabled. Enable with --incremental
      Generating...
   GitHub Metadata: Generating for egor-tensin/egor-tensin.github.io
   GitHub Metadata: Calling @client.repository("egor-tensin/egor-tensin.github.io", {:accept=>"application/vnd.github.drax-preview+json"})
   GitHub Metadata: Calling @client.pages("egor-tensin/egor-tensin.github.io", {})
   GitHub Metadata: Calling @client.contributors("egor-tensin/egor-tensin.github.io")
   GitHub Metadata: Calling @client.latest_release("egor-tensin/egor-tensin.github.io")
   GitHub Metadata: Calling @client.organization("egor-tensin")
   GitHub Metadata: Calling @client.organization("egor-tensin")
   GitHub Metadata: Calling @client.user("egor-tensin")
   GitHub Metadata: Calling @client.list_repos("egor-tensin", {:type=>"public", :accept=>"application/vnd.github.mercy-preview+json"})
   GitHub Metadata: Calling @client.releases("egor-tensin/aes-tools")
   GitHub Metadata: Calling @client.contributors("egor-tensin/aes-tools")
   GitHub Metadata: Calling @client.releases("egor-tensin/blog")
   GitHub Metadata: Calling @client.contributors("egor-tensin/blog")
   GitHub Metadata: Calling @client.releases("egor-tensin/build-boost")
   GitHub Metadata: Calling @client.contributors("egor-tensin/build-boost")

And it goes on doing 2 calls for every repository of mine, which takes too long. What am I doing wrong?

egor-tensin commented 2 years ago

I have the repository setting in my _config.yml pointing to the correct repository, it has been that way since forever.

egor-tensin commented 2 years ago

As part of the upgrade, my jekyll-github-metadata gem was bumped from version 2.13.0 to 2.14.0. However, if I fix the version at 2.13.0, the problem goes away.

egor-tensin commented 2 years ago

Sorry, I can see the changes that supposedly took place in 2.14 in History.markdown. Isn't that insanely slow though? I have about 40 repos, and jekyll-github-metadata makes 2 API calls for each of them.

On a somewhat related note, my very basic website https://github.com/egor-tensin/egor-tensin.github.io takes about a second to build with github-pages & jekyll-github-metadata v2.13. After upgrading to Jekyll 4 whlie still using jekyll-github-metadata v2.13, it takes almost 4 seconds to build. If I run bundle exec jekyll build --verbose, I can see that it makes 9 API calls:

 Incremental build: disabled. Enable with --incremental
      Generating...
   GitHub Metadata: Generating for egor-tensin/egor-tensin.github.io
   GitHub Metadata: Calling @client.repository("egor-tensin/egor-tensin.github.io", {:accept=>"application/vnd.github.drax-preview+json"})
   GitHub Metadata: Calling @client.pages("egor-tensin/egor-tensin.github.io", {})
   GitHub Metadata: Calling @client.contributors("egor-tensin/egor-tensin.github.io")
   GitHub Metadata: Calling @client.latest_release("egor-tensin/egor-tensin.github.io")
   GitHub Metadata: Calling @client.organization("egor-tensin")
   GitHub Metadata: Calling @client.list_repos("egor-tensin", {:type=>"public", :accept=>"application/vnd.github.mercy-preview+json"})
   GitHub Metadata: Calling @client.releases("egor-tensin/egor-tensin.github.io")
   GitHub Metadata: Calling @client.organization("egor-tensin")
   GitHub Metadata: Calling @client.user("egor-tensin")

With github-pages, it only makes one:

   Rendering Layout: index.html
     Layout source: theme
   GitHub Metadata: Calling @client.repository("egor-tensin/egor-tensin.github.io", {:accept=>"application/vnd.github.drax-preview+json"})
           Writing: /home/egor/workspace/personal/egor-tensin.github.io/_site/index.html

The latter is what I want to happen, since I don't want to wait 3 unnecessary seconds more for each build. I also don't quite understand how this works, since the jekyll-github-metadata version is the same in both cases in my Gemfile.lock. Is the Jekyll version logic coded somewhere in github-metadata code?

Sorry for the spam.

ashmaroli commented 2 years ago

Thank you for reporting this, @egor-tensin Technically, this is neither a bug in this plugin nor in Jekyll 4, but rather a consequence of an expected behavior.

Technical Explanation {{ site.github }} is essentially Jekyll::GitHubMetadata::MetadataDrop a subclass of Jekyll::Drops::Drop exposed via site.config["github"]. When the drop instance, drop gets inspected (i.e. drop.inspect gets called, by whatever means), every key in the drop has its counterpart value resolved, which in the plugin's case, results in an API call.

Why is this observed only with Jekyll 4? Because Jekyll 4 contains a call to site.config.inspect which in turn triggers site.config["github"].inspect.

What next? I'm not sure, honestly. Therefore: /cc @parkr, @mattr- for inputs.

egor-tensin commented 2 years ago

@ashmaroli Thanks, that makes sense. To clarify, I have 2 issues:

I wonder how other people are using Jekyll 4 w/ github-metadata. Sticking to Jekyll 3 in the meantime...

ashmaroli commented 2 years ago

@egor-tensin If you would like to use Jekyll 4 until the maintainers' decision, there is a workaround..:

Create a temporary plugin at path _plugins/jgm_patch.rb with content:

module Jekyll
  module GitHubMetadata
    class MetadataDrop < Jekyll::Drops::Drop
      def inspect
        "#<#{self.class.name} repo: #{repository.nwo}>"
      end
    end
  end
end
parkr commented 2 years ago

@egor-tensin If you could give that patch that @ashmaroli provided above, that would be great! If that solves it, I would be happy to include it in the production gem.

Why is this observed only with Jekyll 4? Because Jekyll 4 contains a call to site.config.inspect which in turn triggers site.config["github"].inspect.

Why do we have a site.config.inspect call? The idea of drops was to make computing their values only happen when a user asks for it, but it seems like this code would make all of that worthless?

too slow

Maybe we can add API call latency to the debug logs. It shouldn't be more than 500ms or so per call (which adds up with so many calls) but would be good to have.

egor-tensin commented 2 years ago

If you could give that patch that @ashmaroli provided above, that would be great! If that solves it, I would be happy to include it in the production gem.

It does solve the problem, yes. Only one API call now, as expected.

parkr commented 2 years ago

Why do we have a site.config.inspect call?

https://github.com/jekyll/jekyll/blob/9587a73c53d652699d2441458205d4423b888641/lib/jekyll/cache.rb#L35

Maybe there's another way we could do this, like config.hash, to determine whether this has changed.

It does solve the problem, yes. Only one API call now, as expected.

🎉 I think we can include this in our code here.

ashmaroli commented 2 years ago

@parkr I wouldn't recommend including the provided workaround in the official codebase because that would break {{ site.github | inspect }} for all users of the plugin.

The fundamental issue here is that {{ site.github }} is exposed via site.config["github"] instead of site.site_payload["site"]["github"].

Jekyll 4 calling site.config.inspect isn't illegal. If not Jekyll core, it could've been any other plugin.

parkr commented 2 years ago

@egor-tensin If you can help test my PR with Jekyll 3 and 4, that would be very helpful:

gem "jekyll-github-metadata",
  git: "https://github.com/jekyll/github-metadata",
  branch: "dont-munge-site-github"
egor-tensin commented 2 years ago

@parkr It works, one API call w/ both Jekyll 3 & 4.

parkr commented 2 years ago

@egor-tensin Excellent, thank you for testing! I'm assuming that it also outputs the data you need properly. 😄

egor-tensin commented 2 years ago

@parkr Yep, all seems to be well, checked on a couple of Jekyll projects.

egor-tensin commented 2 years ago

Is the fix going to be released any time soon? Not that I'm in a huge rush really, but upgrading to Jekyll 4 without messing up my Gemfile would still be nice.

parkr commented 2 years ago

2.15.0 is out now.