octokit / octokit.net

A GitHub API client library for .NET
https://octokitnet.readthedocs.io/en/latest/
MIT License
2.68k stars 1.07k forks source link

querying for all releases in WinForms app takes 10 minutes to complete #1774

Closed Dorky106 closed 2 years ago

Dorky106 commented 6 years ago

So I'm following the documentation on the website on do to get releases. However this is where I'm at.

var tmp = client.Repository.Release.GetAll("Dorky106", "BetterTrees");
var latest = tmp[0];

Only issue is

Severity Code Description Project File Line Suppression State Error CS0021 Cannot apply indexing with [] to an expression of type 'Task<IReadOnlyList>'

So I can't progress any further. If this has been changed can someone please tell me how to go about continuing.

M-Zuber commented 6 years ago

client.Repository.Release.GetAll is an asynchronous function so you will need to await it first, making your sample look like

var tmp = await client.Repository.Release.GetAll("Dorky106", "BetterTrees");
var latest = tmp[0];
Dorky106 commented 6 years ago

Just to make sure I am doing this correctly as it freezes at GetAll GetAll(Owner, Name) Owner = owner of the repo Name = Name of repo

public static async Task GetReleases(string GIT_URL)

        {
            System.Net.ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
            Octokit.Release release = null;

            try
            {
                string[] tmp = Form1.Instance.githubID.Split(':');
                var client = new Octokit.GitHubClient(new Octokit.ProductHeaderValue(tmp[0]));
                var basicAuth = new Octokit.Credentials(Username, Password);
                //client.Credentials = basicAuth;
                var releases = await client.Repository.Release.GetAll("Dorky106", "BetterTrees");
                release = releases[0];

                Console.WriteLine(GIT_URL + " got release!");
            }
            catch (Exception ex)
            {
                Form1.Instance.ErrorOut(GIT_URL + "\n" + ex.ToString());
            }
            return release;
        }
Korporal commented 6 years ago

@Dorky106 the await keyword 'returns' the IEnumerable after the asynchronous operation (represented by the task) completes. When execution first hits the await operation, the task gets started and your code actually returns at that point, to its caller. When the operation completes the system causes the next line after await to execute, usually on the same thread that began the operation.

Dorky106 commented 6 years ago

Okay, but being stuck on the await to complete for 10+ minutes kinda kills the point of using this library for me. As no one is going to want to wait 10+ minutes for something to finish a simple task.

Korporal commented 6 years ago

@Dorky106 - For an initial test kind of thing you can do (as I have when playing around) use:

var releases = client.Repository.Release.GetAll("Dorky106", "BetterTrees").Result;

That will synchronously (as they refer to it) wait inside the 'Result' property and your code will behave more intuitively but it will still take the same time. There's no asynchronous code then and it's a bit easier to fathom when experimenting.

If the call is taking 10 minutes then is that unreasonable? I strongly suspect that the delay will be at GitHub themselves rather than Octokit. Are you unwittingly pulling a large volume of data?

I've experienced rather lengthy delays myself on some things like getting all branches that meet some compare condition with another branch, that requires a compare operation on each branch and these seem to take some time. (like 45 seconds for like 30 odd branches).

Also note - I can see that GetAll( ) has overloads which take a ApiOptions object, seems you can get the data back as a number of "pages" and specify the size of each page. This may enable you to get say a few back at a time and display these or whatever as the results come back rather than waiting and doing nothing while the whole set is transferred.

Out of curiosity, how many items are returned for these lengthy operations?

You could write some helper code that encapsulates that paging and perhaps uses a page size of say 20 yet yield return each item one at a time. This way the caller would see what seems like a continuous set of results but in reality they are being pulled 20 at a time in quick succession.

Perhaps Octokit itself already provides a "paging-set to IEnumerable" helper somewhere...

shiftkey commented 6 years ago

Okay, but being stuck on the await to complete for 10+ minutes kinda kills the point of using this library for me.

@Dorky106 the default behaviour of GetAll is to follow pagination hints and fetch all records related to an API. If you're on a repository with thousands of releases, this might explain why the request is taking so long to complete.

If you only want to get a subset of these releases, there is an overload that takes an ApiOptions object which lets you control how much data you want:

var firstTen = new ApiOptions
{
    PageSize = 10,
    PageCount = 1
};
var releases = client.Repository.Release.GetAll("Dorky106", "BetterTrees", firstTen).Result;

There's also the GetLatest API that is intended to just get the latest release:

var release = client.Repository.Release.GetLatest("Dorky106", "BetterTrees").Result;
Dorky106 commented 6 years ago

I was testing on a repo with 3 releases, and even getlatest was taking ten minutes. Already gave up and just made a readonly token to use and removed the need for github info. (As this was really all I tried using this lib for was to make sign-ins needed instead of a token)

ryangribble commented 6 years ago

So this is a windows forms app? It's definitely not right that a call should take 10 minutes. If you are able to post more of your code we could try running it on another system.

You could also just simplify your example and test your above code in a console application, to see if it has the same problem

Korporal commented 6 years ago

@shiftkey @ryangribble - Hi,

I was curious about how I'd implement an alternative "GetAll" into Octokit, I can see it's defined deep down as a member of the IApiConnectionclass. What I was curious about is implementing an alternative that returnsIEnumerable<T> and under the hood uses pagination to get pages "on demand", this is a pattern I've used before with other REST services.

In that pattern I simply get the first page, then do something like this:

page = GetFirstPage();

while (page.ItemsRemaining)
{
   foreach (item in page)
    {
       yield return item;  
    }

   page = GetNextPage();
}

I'm sure you get the idea, the big question I have for you is how to introduce this without breaking existing consumers of the interface IApiConnection.

This patterns only goes to the network to get more items as and when the caller is enumerating items and there are no more items left in the "current" page. So if caller was enumerating something like

GetAll(...).Where(...).First(20);

Then only as many network calls as are necessary would be made to get the first matching 20 items and no more, this is a potentially huge saving for certain kinds of data extraction.

shiftkey commented 6 years ago

@Korporal the first issue that I can see is that most of the GetAll APIs return Task<IReadonlyList<T>> which is a more concrete abstraction than IEnumerable<T>, so I'm not sure how a drop-in replacement would work when that signature needs to change. But don't be afraid to experiment with a different implementation if you think this might help for better lazy-loading scenarios.

Korporal commented 6 years ago

@shiftkey - OK yes, that's true - very interesting to explore options here and yes - lazy loading - is exactly the term I was seeking!

Korporal commented 6 years ago

@shiftkey - Would the team here regard a change to IApiPagination a breaking change? The interface is public but seems to perform a purely internal role. I'm considering a small change to the definition of GetAllPagesthat could (I think) make it straightforward to introduce lazy pagination with zero changes to consumers of Octokit.

ryangribble commented 6 years ago

From your original issue text )reading via email) I was wondering if you werent aware we had the ApiOptions pagination settings, and users already can write a loop that retrieves each page one by one as they need them (you can also generally specify a larger ItemsPerPage value than the default eg default might be 20 but you can often request 100 per page).

But after your edits and further discussion it seems you just want this to be more abstracted away from the user and effectively to stream chunks of records to the user as they enumerate through the collection? Isn't that what the Reactive IObservable stuff is meant for?

In any case, feel free to play around with whatever implementation you like and put up a PR for discussion/feels... We do try to avoid breaking changes but we can figure that part out once there is an actual "better" implementation to consider

Korporal commented 6 years ago

@ryangribble - Hi, yes you may be correct and its better to leverage IObservable than implement a lazy paging mechanism. I've not really done much with IObservable and need to devote some time to it.

So far as the paging goes, yes I did see ApiOptionsand was considering making start page zero (0) act as a sentinel so the underlying logic in Octokit could detect that value and then do the lazy paging, but that all depends on whether changing the signature of the method in IApiPaginationwould be regarded as a breaking change or not.

If that signature could be tweaked I think I have a rather neat way of adding this lazy paging while maintaining the integrity of all other existing interfaces.

I can't imagine any scenario where a consumer of Octokit would need to ever access IApiPagination because it seems to be of purely internal significance.

I may just play with this when I get some time and show you guys what I end up with just out of academic interest.

shiftkey commented 6 years ago

I've not really done much with IObservable and need to devote some time to it.

@Korporal IObservable is a push-based way of enumerating values, which is a bit different to what you had in mind. So I'm going to share some reservations of mine about this change, and provide some hints as to where you should be looking.

I'm quietly worried about breaking the behaviour of GetAll here to achieve lazy loading (because it's there in the name - Get All) and IEnumerable is basically a synchronous interface (MoveNext() does block, so if you're going to make a network request when you need a new page it's still not going to be a great experience for the consumer).

I'd recommend instead looking at IAsyncEnumerable and in particular IAsyncEnumerator which is part of Ix.NET.

The signature for IAsyncEnumerator is much closer to what you are looking for in terms of an asynchronous enumerable:

Task<bool> MoveNext(CancellationToken cancellationToken);

There's been very early discussions about making this a part of the BCL and supporting the ability to foreach over these like you would a regular IEnumerable, but I'm not involved with them at all so you'll have to keep up with Mads Torgensen et al and future C# language features.

Korporal commented 6 years ago

@shiftkey - Thanks for the additional info on IObservable, much appreciated. Also just to be very clear I was not suggesting any change to GetAllonly the method signature in IApiPagination, no other existing public interface or method signature would change.

I created a system last year that used T4 templates along with a method defintion XML file to generate a client class around a REST service (which has hundreds of JSON classes and hundreds of endpoints).

In that design some methods in the XML get generated as TWO methods in the generated client class(es), these were paginated methods similar what I see in Octokit. When the T4 translated such a method is created a paged based conventional method where caller can enumerate the pages easily and another method (name had 'Seq' appended, an F# convention) which did something like this, internally enumerates pages and yield returns each item.

It's particularly useful for scenarios where the caller is doing a.First() or .Take() operation because the caller is unaware of the pagination and the system pulls only as many pages as are required to satisfy the .First() or .Take() if the caller really does want all pages "at once" then .ToList() gives them that.

So if I did spend any time here on this (and I'm more academically interested in how I'd do it than anything else) the only existing public interface that I think I'd need to change is the method signature inIApiPaginationwhich I suspects is of no interest to any Octokit consumers (I may be wrong of course).

On a very separate topic I'm developing a system to pull data in various ways from GitHub and I think you'd urge me to leverage the reactive version of Octokit here, is that true? If so is there anything missing from the current reactive version that's present in the ordinary version? The consumer in this case will be a .Net Core Web App by the way.

Thanks

StefH commented 5 years ago

Another question:

It seems that it's not possible to get all releases from an organization?

var releases = client.Repository.Release.GetAll("WireMock-Net", "WireMock.Net").Result;
// releases has no values ?

I'm using oktokit version 0.32.0

shiftkey commented 5 years ago

@StefH that isn't supported by the GitHub API, which is a constraint on what can be supported in Octokit.

shiftkey commented 4 years ago

So this is a windows forms app? It's definitely not right that a call should take 10 minutes. If you are able to post more of your code we could try running it on another system.

We didn't hear back from @Dorky106 here about the context for where the sample code is running, and maybe it does just need a Windows Form app to see the issue on our side.

Updated title to focus on the original bug rather than the side discussion about pagination.

Dorky106 commented 4 years ago

ya, sorry I stopped getting any notices about this. Can't even remember what the issue was fully now or what I was doing XD

shiftkey commented 4 years ago

The repository which reproduced the issue is still present, and we have a sample code handy, so I'm going to mark this as up-for-grabs to see if someone can reproduce it.

Dorky106 commented 4 years ago

alright

github-actions[bot] commented 2 years ago

👋 Hey Friends, this issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Please add the pinned label if you feel that this issue needs to remain open/active. Thank you for your contributions and help in keeping things tidy!