shinyorg / shiny

.NET Framework for Backgrounding & Device Hardware Services (iOS, Android, & Catalyst)
https://shinylib.net
MIT License
1.45k stars 227 forks source link

[Bug]: Crash During Shiny.Jobs initialization (Needs reproducible sample) #879

Closed appimpact closed 2 years ago

appimpact commented 2 years ago

Component/Nuget

Jobs (Shiny.Jobs)

What platform(s) are effected?

Steps To Reproduce

Nothing in particular, just added Shiny.Jobs in accordance with instructions and registered few basic jobs.

Expected Behavior

App should not report crashes.

Actual Behavior

We have noticed unusually high number of Crashes and ANRs reported on Google Play Console. After investigation, we have managed to isolate the exception that is causing an issue and it is somehow related to start of Shiny.Jobs. We didn't notice this on any our devices, and this seems to happen randomly, but we can't find what could be the cause of error. It seems that it is related to internal serialization process of Shiny.Jobs.

Exception or Log output

android.runtime.JavaProxyThrowable: at Newtonsoft.Json.Serialization.DefaultContractResolver.CreateObjectContract (System.Type objectType) [0x00007] in :0 at Newtonsoft.Json.Serialization.DefaultContractResolver.CreateContract (System.Type objectType) [0x0010f] in :0 at System.Collections.Concurrent.ConcurrentDictionary2[TKey,TValue].GetOrAdd (TKey key, System.Func2[T,TResult] valueFactory) [0x00034] in <6bae24849a744d84854913ecd8d9d089>:0 at Newtonsoft.Json.Utilities.ThreadSafeStore`2[TKey,TValue].Get (TKey key) [0x00000] in :0 at Newtonsoft.Json.Serialization.DefaultContractResolver.ResolveContract (System.Type type) [0x0000b] in :0 at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.GetContract (System.Type type) [0x00000] in :0 at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.GetContractSafe (System.Type type) [0x0000b] in :0 at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize (Newtonsoft.Json.JsonReader reader, System.Type objectType, System.Boolean checkAdditionalContent) [0x0000e] in :0 at Newtonsoft.Json.JsonSerializer.DeserializeInternal (Newtonsoft.Json.JsonReader reader, System.Type objectType) [0x00054] in :0 at Newtonsoft.Json.JsonSerializer.Deserialize (Newtonsoft.Json.JsonReader reader, System.Type objectType) [0x00000] in :0 at Newtonsoft.Json.JsonConvert.DeserializeObject (System.String value, System.Type type, Newtonsoft.Json.JsonSerializerSettings settings) [0x0002d] in :0 at Newtonsoft.Json.JsonConvert.DeserializeObject (System.String value, System.Type type) [0x00000] in :0 at Shiny.Infrastructure.ShinySerializer.Deserialize (System.Type objectType, System.String value) [0x00000] in <5369411ebbfc4ef49536271960f48b2f>:0 at Shiny.Infrastructure.FileSystemRepositoryImpl+<>cDisplayClass14_0.b0 () [0x0005d] in <5369411ebbfc4ef49536271960f48b2f>:0 at System.Threading.Tasks.Task.InnerInvoke () [0x0000f] in <6bae24849a744d84854913ecd8d9d089>:0 at System.Threading.Tasks.Task.Execute () [0x00000] in <6bae24849a744d84854913ecd8d9d089>:0 at Shiny.Infrastructure.FileSystemRepositoryImpl.Set (System.String key, System.Object entity) [0x000d0] in <5369411ebbfc4ef49536271960f48b2f> at Shiny.Jobs.AbstractJobManager.Register (Shiny.Jobs.JobInfo jobInfo) [0x00096] in :0 at Shiny.Jobs.Infrastructure.JobsStartup.Start () [0x001c3] in :0 at System.Runtime.CompilerServices.AsyncMethodBuilderCore+<>c.b7_0 (System.Object state) [0x00000] in <6bae24849a744d84854913ecd8d9d089>:0 at Android.App.SyncContext+<>cDisplayClass2_0.b__0 () [0x00000] in :0 at Java.Lang.Thread+RunnableImplementor.Run () [0x00008] in :0 at Java.Lang.IRunnableInvoker.n_Run (System.IntPtr jnienv, System.IntPtr native__this) [0x00008] in :0 at (wrapper dynamic-method) Android.Runtime.DynamicMethodNameCounter.30(intptr,intptr) at mono.java.lang.RunnableImplementor.n_run (Native Method) at mono.java.lang.RunnableImplementor.run (RunnableImplementor.java:30) at android.os.Handler.handleCallback (Handler.java:938) at android.os.Handler.dispatchMessage (Handler.java:99) at android.os.Looper.loop (Looper.java:246) at android.app.ActivityThread.main (ActivityThread.java:8653) at java.lang.reflect.Method.invoke (Native Method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run (RuntimeInit.java:602) at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:1130)

Code Sample

No response

Code of Conduct

aritchie commented 2 years ago

You have to provide a reproducible sample or this will be closed

appimpact commented 2 years ago

Unfortunately it is not possible as this is not related to our code and these logs are available only on Google Play Console. We are using App Center to track crashes but we can't find logs of this exception there as this happens very early in the app lifecycle so App Center is probably not even initialized yet.

This is production build issue, and reproducing this would mean deploying blank app to Google Play Console as we have no other way to track it. We also don't know how to reproduce the error as it originates from Shiny.Jobs.Infrastructure.JobsStartup.Start () method.

We can give you any other diagnostics information related to project setup if that would be helpful.

appimpact commented 2 years ago

Following libraries are added to both shared and Android projects:

AndroidManifest.xml has the following permissions:

<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.ACCESS_WIFI_STATE" />
<uses-permission android:name="android.permission.BATTERY_STATS" />
<uses-permission android:name="android.permission.BIND_JOB_SERVICE" />
<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.RECEIVE_BOOT_COMPLETED" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.ACCESS_COARSE_LOCATION" />
<uses-permission android:name="android.permission.ACCESS_FINE_LOCATION" />
<uses-permission android:name="android.permission.ACCESS_LOCATION_EXTRA_COMMANDS" />
<uses-permission android:name="android.permission.ACCESS_MOCK_LOCATION" />

In MainActivity.cs, we are calling this method at the very start of overridden OnCreate method: this.ShinyOnCreate();

Additionally we are using the following code in MainActivity.cs:

        public override void OnRequestPermissionsResult(int requestCode, string[] permissions, [GeneratedEnum] Permission[] grantResults)
        {
            Xamarin.Essentials.Platform.OnRequestPermissionsResult(requestCode, permissions, grantResults);
            base.OnRequestPermissionsResult(requestCode, permissions, grantResults);
            this.ShinyOnRequestPermissionsResult(requestCode, permissions, grantResults);
        }

        protected override void OnActivityResult(int requestCode, Result resultCode, Intent data)
        {
            base.OnActivityResult(requestCode, resultCode, data);
            this.ShinyOnActivityResult(requestCode, resultCode, data);
        }

        protected override void OnNewIntent(Intent intent)
        {
            base.OnNewIntent(intent);
            this.ShinyOnNewIntent(intent);
        }

MainApplication.cs looks like this:

    [Application]
    public class MainApplication : Application
    {
        public MainApplication(IntPtr javaReference, JniHandleOwnership transfer)
          : base(javaReference, transfer)
        {
        }

        public override void OnCreate()
        {
            this.ShinyOnCreate(new Startup());
            Xamarin.Essentials.Platform.Init(this);
            base.OnCreate();
        }
    }

Startup.cs looks like this:

    public class Startup : ShinyStartup
    {
        public override void ConfigureServices(IServiceCollection services, IPlatform platform)
        {
            services.UseJobs(true);
            services.UseFirebaseMessaging<DefaultPushDelegate>();
        }

        public override IServiceProvider CreateServiceProvider(IServiceCollection services)
        {
            ContainerLocator.SetContainerExtension(() => new DryIocContainerExtension());
            IContainer container = ContainerLocator.Container.GetContainer();
            DryIocAdapter.Populate(container, services);
            return container.GetServiceProvider();
        }
    }

Jobs are registered like this by using IJobManager: await _jobManager.Register(new ActivityFeedEntryUnseenCountUpdateJobInfo());

This is job info class:

    public class ActivityFeedEntryUnseenCountUpdateJobInfo : JobInfo
    {
        public ActivityFeedEntryUnseenCountUpdateJobInfo()
            : base(typeof(ActivityFeedEntryUnseenCountUpdateJob), nameof(ActivityFeedEntryUnseenCountUpdateJob), true)
        {
            BatteryNotLow = true;
            DeviceCharging = false;
            RequiredInternetAccess = InternetAccess.Any;
            Repeat = true;
            PeriodicTime = TimeSpan.FromSeconds(60);
        }
    }

This is Job class:

public class ActivityFeedEntryUnseenCountUpdateJob : IJob
    {
        #region Services

        private readonly IActivityFeedEntryApi _activityFeedEntryApi;
        private readonly IMenuElementCollection _menuElementCollection;
        private readonly IAuthorizer _authorizer;

        #endregion

        public ActivityFeedEntryUnseenCountUpdateJob(IActivityFeedEntryApi activityFeedEntryApi,
            IMenuElementCollection menuElementCollection,
            IAuthorizer authorizer)
        {
            _activityFeedEntryApi = activityFeedEntryApi;
            _menuElementCollection = menuElementCollection;
            _authorizer = authorizer;
        }

        public async Task Run(JobInfo jobInfo, CancellationToken cancelToken)
        {
            if (await _authorizer.IsSatisfiedAsync(ActivityFeedEntryPolicies.READ))
            {
                ApiResWrapper<ActivityFeedEntryMobileGetUnseenCountRes> res = await _activityFeedEntryApi.GetUnseenCountAsync();

                if (res.Success)
                {
                    _menuElementCollection.UpdateCounterValue(nameof(ActivityFeedEntryListPage), res.Data.Count);
                }
            }
        }
    }

Code is very simple - it creates an API call to fetch information about unread activity feed entries.

Again, the exception that we have posted has never happened on any of our devices and we don't have bugs reported by app users.

However, on 5.000+ users of one of the apps, we have about 200-300 crashes per week so it's entirely random and very hard to reproduce.

appimpact commented 2 years ago

image

It happens on all Android versions, and on various device models (list is too long, pretty much all smartphone brands are listed).

aritchie commented 2 years ago

I get you're having an issue, but without the necessary test case, I'm going to close this. I'm not here to guess.

Things you should check to determine your root cause:

appimpact commented 2 years ago

We don't know if exceptions are occurring on iOS. This was invisible for us as well on Android until we've checked Google Play Console.

As we have mentioned, this was not reported by anyone - we did regular scans on performance of our apps and noticed unusually high spike in number of crashes which has lead us to investigate.

We've checked the code, and what you're saying is right - nevertheless this still keeps happening.

We do understand that you want to keep the library free of bugs, but this really is a bug that we can't fix / can't reproduce and it happens on all our apps since we've introduced Shiny.Jobs for background tasks.

Please consider not closing this right away and instead requesting more diagnostics from us.

Something is off, that's for sure, but problem is that it is random.

Jobs are registered long after application is started, so why is it showing an error during Start is what we can't understand and why is not shown always.

Can you at least guess what might cause the serialization to fail? Is there something to be careful of?

aritchie commented 2 years ago

Please consider not closing this right away and instead requesting more diagnostics from us.

As with all issues, if there isn't a reproducible case within 3 days, I close it. This isn't a support forum, nor is this a product. The library is working fine for me.

We do understand that you want to keep the library free of bugs, but this really is a bug that we can't fix / can't reproduce and it happens on all our apps since we've introduced Shiny.Jobs for background tasks.

This really is a bug you CAN fix. You can debug your issue, add logging, and provide a patch if necessary to the library if you feel so inclined. I'm happy to help, but you have to play by the terms of this project. You can always remove Shiny if you are convinced it is bugged.

Jobs are registered long after application is started, so why is it showing an error during Start is what we can't understand and why is not shown always. Can you at least guess what might cause the serialization to fail? Is there something to be careful of?

I've given you the list of things to check which give suggestions to all the questions you've asked that includes why serialization may fail. That's as far as I'm willing to go without a test case.

appimpact commented 2 years ago

1) We are using latest releases of all your libraries that are referenced by the projects 2) We have posted how is JobInfo created - we are using typeof and nameof, and couple of predefined parameters - we don't see how any of those can affect serialization 3) We are not registering any jobs on startup - jobs are registered in a view model that happens long after the startup logic has been executed

Check list has been cleared before we even opened this issue.

Now, on the side-note, please check our GitHub profile and you will see that we always either provide repository with example, or provide code snippets or even contribute with code suggestions.

As it was written previously, we can't provide any reproducible sample because this is not happening always, it never happens in the debug mode and we can't reproduce it on any of our devices. All we have are reports from Google Play Console.

All code related to the Shiny.Jobs library in our projects has been posted here. This is how it looks on Google Play Console after we've placed the code in production on one of our apps:

image

We are here because we are enthusiastic about your library - not because we really need it or depend on it.

We watched your videos and decided to give it a shot with Shiny even though it is a new project and despite the fact that a lot of OSS projects have lifespan of 1-2 years before author gets bored / busy with new project.

Everything we wrote here is done with one goal in mind - to try to find out the cause of error and make the library even better, but it seems that we are the only ones interested in looking more into the problem.

Comments such as "works fine for me" and "you can always remove Shiny" are really not a good sign for us that this library is worth the effort in the long run. You are encouraging us to drop enthusiasm about the library.

Again, we came here with best intentions and willingness to investigate the issue on our side. All we need are hints from you since we really can't understand entirely your workflow with sparse documentation and lack of comments in code.

If you are interested in working with us to try to resolve this "ghost" bug, we are willing to continue to put more effort on our side and provide more diagnostics information that we can collect based on your hints.

If not, then thank you for your assistance so far and since no one will be working anymore on this issue it can be closed right away.

appimpact commented 2 years ago

Of all the properties that get serialized (correct us if we are looking into wrong model), this one could cause issues:

image

string, bool, Dictionary, enums and even double should not be affected by cultures.

However, when it comes to DateTime, culture of serialization must be the same as culture of deserialization.

Majority of our apps are multilingual with possibility for users to switch culture at any point of time.

Furthermore, we have default culture for anonymous users that can switch to different culture shortly after startup if user was previously authenticated.

Since DateTime formats are substantially different (order of month and day is reversed) in used cultures, that could be the cause of deserialization error and also the reason why we don't see frequently as those changes don't happen all the time.

Would implementation of culture invariant serialization of PersistJobInfo be acceptable change?

appimpact commented 2 years ago

Another proposal is to allow better diagnostics on deserialization process.

You could introduce custom exception:

    public class ShinySerializerException : Exception
    {
        public ShinySerializerException(string value, Type type)
            : base($"Unable to deserialize to contract type {type} => {value}")
        {
        }
    }

And then throw that one instead:

        public object Deserialize(Type objectType, string value)
        {
            try 
            {
                return JsonConvert.DeserializeObject(value, objectType);
            }
            catch
            {
                throw new ShinySerializerException(value, objectType);
            }
        }

This would give more information on what was the error during the deserialization in production.

Newtonsoft.Json is not really giving much of an information on why did deserialization fail. Custom exception would give better diagnostics information.

appimpact commented 2 years ago

One more thing.

As you could see from the exception stack, this code is hit:

image

However, as we've mentioned previously, we did not register any jobs at the startup.

We've investigated further and found that NotificationModule is registering a job internally:

image

We are using two of your libraries:

From what we could understand so far, FirebaseMessaging is using Shiny.Notifications, which in turn uses Shiny.Jobs to schedule this job.

This would explain why we hit if(jobs.Count > 0) statement, even though we haven't scheduled anything.

aritchie commented 2 years ago

Everything we wrote here is done with one goal in mind - to try to find out the cause of error and make the library even better, but it seems that we are the only ones interested in looking more into the problem.

I woke to 4 insanely long posts where you keep reiterating the same non-helpful information in hopes that I magically fix your issue. None of the posts contain what I've requested - something feasible that I can work with. You seem to think that your issue is my issue. If you provided a reproducible case, I would have fixed it without anything further. You can't provide it because you don't know the cause, that's fine, but don't continue to bombard me with messages that aren't what's been requested. You are not sponsors, you are not customers, you are consumers.

Check list has been cleared before we even opened this issue.

But you didn't offer this info to me to assist, nor did your point form list above even answer the questions. What's the crash in iOS then? You told me you didn't know, so that checklist item is still open. Did you update Shiny across versions or is this the first one. If you did, did the previous version work?

Comments such as "works fine for me" and "you can always remove Shiny" are really not a good sign for us that this library is worth the effort in the long run. You are encouraging us to drop enthusiasm about the library.

You do you. You haven't really applied any effort into the bug until the last couple of posts. You seem to still have the attitude that this is a product.... it is NOT. I want my libraries to help people, I want to fix issues, but not at the cost of my time hunting for issues where I don't have enough info. I am, however, encouraging you to go elsewhere if this is going to be the push tactics you use.

All we need are hints from you since we really can't understand entirely your workflow with sparse documentation and lack of comments in code.

I'm sorry this isn't up to your standards. I also wasn't aware you were paying me for my time in developing this code that you are using for your 5000+ users ;P I gave you hints to start looking at.

If you are interested in working with us to try to resolve this "ghost" bug, we are willing to continue to put more effort on our side and provide more diagnostics information that we can collect based on your hints.

After all the crap I've had to read above, why do you think I'll care more now? When you haven't given me the thing I've requested 3 times now? Again... I get you don't know what it is, but as professional developers, ask yourself - where would I start if it isn't crashing for me without any sort of input? The code is open, so you can also fork and try putting in some debug statements. You see the same stack trace, use it - test your theories and let me know.

If you figure out a test case or a reproducible sample, I'm happy to fix the issue. You had some good ideas, but did you try them? Did they fix the issue? You can also change the serializer by registering your own in startup (services.AddSingleton<ISerializer, YourSerializer>(). I'll think of some ways to improve this in the future regardless of the outcome, but without the necessary info, I'm just wandering in the dark which isn't something I'm willing to do.

I get you don't like the policy of "reproducible issue or the issue is closed", but that's my policy because it's my project and this is the way I choose to run it. You aren't likely to find many other open source repos that don't follow this policy themselves to some degree.

Now if I get anymore long winded flack like the shit I just read above, I'm just going to remove you from the repository going forward. Learn the ropes here and respect them or go elsewhere - it is that simple.