neuecc / Utf8Json

Definitely Fastest and Zero Allocation JSON Serializer for C#(NET, .NET Core, Unity, Xamarin).
MIT License
2.35k stars 266 forks source link

Started a fork/v2 at https://github.com/vilinet/Utf8Json-v2 #223

Open vilinet opened 4 years ago

vilinet commented 4 years ago

I know it is not the nicest way to introduce a fork of a repo but i had no other choice. I asked the author but my messages were ignored.

The v2 can be found at https://github.com/vilinet/Utf8Json-v2/ I cleaned up the code a bit, added new frameworks. And already fixex some issues but there are many left.

I hope we can make this stunning library even better :) Its alive!

@vilinski @RamType-0 @st-gwerner @SaAnnka @neon-izm @Kiryushin-Andrey @AndrewGumenyuk @ pCYSl5EDgo

vilinski commented 4 years ago

do you really have a time and energy to maintain the fork? are you sure @neuecc has really stepped down? I mean I would appreciate to see this lib evolving and bugfixed. But... not if it results in another "padleft" hell.

RamType0 commented 4 years ago

@neuecc said that he gonna work with this repo after he release UniTask v2

vilinet commented 4 years ago

@neuecc said that he gonna work with this repo after he release UniTask v2

That is great news!

It does not matter to me which version just need to keep this stuff alive.

vilinet commented 4 years ago

I made my repo private :) Any idea how long it will take?

I have seen he is is active on many other projects.

st-gwerner commented 4 years ago

I'm relieved to hear he may start giving this attention, but I would certainly be happier if there was more than one maintainer. Even if it means that it requires unanimous consensus on all pull requests, that'd be better than the feeling that things were abandoned for a couple years. There are a few fixes that we all can look at and agree would be good, but without him personally approving them, the issues linger, and that results in way too many forks with the same fix.

jasond-s commented 4 years ago

I would welcome a well maintained fork. The goal here is to have a low allocation UTF8 aware serializer for JSON in .NET.

The point in the MIT license in to foster innovation. If @neuecc (who has done a great job here) does get the time to come back to maintain this library, then all we will have is 2 great libraries to choose from. Maybe there will be some differences that will make each more suitable for different applications, but that also sounds great to me.

I think calling it the same thing is probably unfair though and could create confusion, you should probably come up with a new name, and then maintain recognition for the original base.

vilinet commented 4 years ago

Would be much better if everyone could concentrate on one lib instead decentralizing manpower into 2 separated libs.

Best option really would be to share ownership with someone who could review and approve PR. Or to have a "bugfix branch" that we can take care of. We could also release some bugfix packages in prelease status or just simply increasing the build version.. Whatever.

There are fundamental very basic and simple bugs and they have not been fixed in the last 2 years. And it blocks many people from using it. (like its not supporting json escape sequences).

I know many people who look at the repo and they just dont use it as they dont thrust in an abandoned codebase that have these basic issues untouched.

And the best is how the owner does not even write a single comment ...

vilinski commented 4 years ago

I don't see it such essential to justify a fork. Json serializing is a not complex operation. You can always take another format if it is not fast enough. And there is already decent System.Text.Json. And if you not aware there is also https://github.com/Dogwei/Swifter.Json it claims to be even better than anything else, but you probably need google translate to read the docs ))

markmadlangbayan commented 4 years ago

@vilinski, hello do you think you can share your private link? I'd like to see if we can use it while waiting for the author.

vilinski commented 4 years ago

@vilinski, hello do you think you can share your private link?

@markmadlangbayan you surely mixed me up with @vilinet ? Don't know why he even has my nick :D

penguinawesome commented 4 years ago

@vilinet hi we are encountering a critical & security issue in this library: https://github.com/neuecc/Utf8Json/issues/224 - this \u0003 doesn't serialeze properly causing the front end to crash. I hope you could help us and for the community as well, is there a fix or workaround for this?

vilinet commented 4 years ago

@firephantomassasin Lets have a look at it :)

penguinawesome commented 4 years ago

@vilinet Thank you so much.

vilinet commented 4 years ago

Drop me a mail to the vilinet9@ the mail domain that google uses dottt com :)

st-gwerner commented 4 years ago

Issue #224 sounds a lot like something several of us have fixed, including a PR I submitted #217.

penguinawesome commented 4 years ago

@st-gwerner it works! Thank you so much.

Hey guys @vilinet, i think while the main author is not yet around and looks like this repo is not active anymore, it would be better to create a V2 Fork of this repo and include all the current pending Pull Requests especially those who are bug fixes.

vilinet commented 4 years ago

@st-gwerner Yeah definitely a fix. But also would make string writing much slower because of that condition check on every single character and also copying. Also that array access is not the fastest either.

I would do something like this: https://pastebin.com/xfYC7f3z No additional if, no array access, conversion to hex is faster than the convert one. sure the common stuff could be "aggressivly" inlined though c# compiler most of the times does not care about that either :D

vilinet commented 4 years ago

@firephantomassasin Yeah but also there are many PR that would slow it down serialization. Of course sometimes there is now way to avoid it but many of them can be improved.

st-gwerner commented 4 years ago

@vilinet All the more reason that your idea of a single repo would be amazing, managed by multiple heads who can discuss and evaluate the best ways to do it. My focus was getting correct behavior with the same relative performance (in my testing of 40-50MB files, the check didn't slow performance much at all), but I would love it if we had a central repo where we can bat things around, do additional testing, etc.

I also agree with you also that not all "fixes" are good.

vilinet commented 4 years ago

@st-gwerner Yeah :) Also if the original author decides to continue he can use the actual up-to-date code as a base. Maybe it should not be called v2 but something like utf8json-patches / fixes. Dont know.

Also i checked that Swifter.Json that was linked above and i need to stay that is much more faster then this lib(by 35% with my test data), but also has less feature. But this code base is much more "contributor" friendly and there are things we could use from that lib as well.

vilinski commented 4 years ago

@vilinet just wonder, which features you miss in Swifter.Json compared to Utf8Json? For me no support for F# types makes it currently useles, but author already works on it.

vilinet commented 4 years ago

@vilinski No simple way for camel case output for example. You can use the callback but perfermance drops heavily. Or you rename your properties to use camel case.

No attribute support to ignore/rename properties.

jasond-s commented 4 years ago

@vilinet I opened an issue #209 a short while ago. Being able to easily define your own naming formats would be very handy in any fork, and not hard to expose without slowing anything down. We have to consume a lot of weird APIs at our shop, and having to put attributes on everything is annoying.

vilinski commented 4 years ago

@vilinet not yet tried myself, but there are RWField and RWFielAccess attributes and also other JsonFormatterOptions

vilinet commented 4 years ago

@vilinski you are right! I will have a try!

penguinawesome commented 4 years ago

@st-gwerner @vilinet have you checked this another security issue in this library? https://github.com/neuecc/Utf8Json/issues/127

ramon-garcia commented 3 years ago

@vilinet If you think that you have time to continue this project, please go on with the fork. It is badly needed.

vilinet commented 3 years ago

Okay guys. So i made this https://github.com/vilinet/Utf8Json-v2/ public again. I already released a 1.5 nuget package with some crucial issue fix but raised a ticket also to have a discussion how things should go on(like versioning).

So please have a look :) We can also decide to stop it and we will switch to another library. (There are some nice candidates)

buybackoff commented 3 years ago

@vilinet

Do you really think someone will work on this new repo if you deleted the history and just copied all the source code in the initial commit!? https://github.com/vilinet/Utf8Json-v2/commit/fdf4515d7cac2bef8e4b700706ca98aad0a08301

Not related to this library, but recently I forked a 5-years-old near 1k-stars library and I spent almost a day just to clean and keep Git history, otherwise I could hardly navigate and understand what was going on in this 3rd party codebase. This is important.

In my fork of this library I try to force-push to have as few commits as possible and view them as patches rather than full-blown continuation of upstream development (I do not have skills and resources for that, do you?). There are multiple "natural" forks of this library and it's easy to cherry-pick bug-fixes from there (or from PRs) and keep the changes to upstream visible as isolated patches. It would be great if there is one place that accumulates all useful PRs and fork commits in one place, but keeps the development anchored to the upstream.

My fork is incompatible with the upstream (no explicit Unity support and I do not know if it works there, changed internals so it works with native memory and not only with byte[], serializes tuples as JS arrays and other changes; but faster), so it's not even a candidate for such a place. But I do periodically review if there are worthy changes and cherry-pick them. Your "fork" will be completely invisible to me.

So, ideally and in my opinion, a new centralized fork should have all worthy PRs and commits from other forks that are cherry-picked from their branches (or merged with their authors kept in git history). Anyone should be able to cherry-pick from this fork or rebase their fork on the new centralized fork. This also makes it possible to backport all community contributions to the upstream in a couple of clicks whenever it is active again. Without this any "artificial"/"forced" fork adds more hassle than value.

penguinawesome commented 3 years ago

@vilinet what are those another library (There are some nice candidates) ? So far i've tried other libraries and this one gives the fastest performance.

vilinet commented 3 years ago

@buybackoff I do agree. The creation of that "fork" is crap, not the way as it should have been done.

So the question is again: Who wants to fork it? or Do we choose an existing fork that has been already worked on? And so likely there should be more owner on that project to avoid it getting killed again.

If you say okay, do it properly i am willing to fix my errors and create a new valid fork, clean it up, upgrade to latest libraries, etc. Not quite sure about the history cleanup but maybe that can play too.

vilinski commented 3 years ago

@vilinet from the "right" fork your are just one click on "Fork" button away. ) Wrong way is just mkdir and copy sources. You can also create an organization for the owner(s) and move the repo into it. This way the changes can be backported to the parent repo or cherry picked in other forks. Even better were if @neuecc would give the ownership

@firephantomassasin some nice candidates are mentioned above in this thread, if you not tried them already

vilinet commented 3 years ago

Yeah. There was a guy above how knows him and told he will work on it later maybe. I also tried to email him the one email address i found, but no answer.

buybackoff commented 3 years ago

Aren't you concerned that MSFT will eventually make their new JSON serializer almost as good? And given that MessagePack serializer v2 was integrated with SignalR by @AArnott with @neuecc help, I won't be surprised if the same machinery that makes this library fast will be employed by System.Text.Json and @neuecc may also also participate in that. Personally, I stopped touching/improving my fork after the initial version of System.Text.Json. From one side, it's already good and complete enough; from another, maintaining something for which MSFT has it's own competing solution is not a great idea.

This library is not friendly to Span & Co and is based on byte[], which requires allocations and copying and byte array pooling if original data is in native memory or Span/Memory. MessagePack v2 works with IBufferWriter<byte> and ReadOnlySequence. Maybe a similar work is planned for this library. A fork will inevitably diverge here.

The worst thing in this story is the silence and absence of a roadmap from the author.

vilinski commented 3 years ago

if it were my commercial project maybe I would close the sources exactly for this reason. MSFT has multiple times proven they understand nothing from doing open source, or supporting communities. This alone, and looking around, renders .NET as an almost obsolete plattform for me, even with all improvements last years.

But otherwise I would only welcome any improvements in BCL, cause I can only profit from it. System.Text.Json has a long way to the feature parity to Newtonsoft and in performance to this and some other libraries.

This doesn't stops me in any way to contribute to the libs I use, even it it is difficult to contribute. Short answer - I don't care ))

AArnott commented 3 years ago

I can't speak for @neuecc, but I'd guess this repo works for him as-is so his priority is low to improve it. MessagePack v2 happened because I wanted to pick it up for Visual Studio but couldn't in its v1 form. I started that with a fork and tried to keep communication open as much as possible with the original author to increase the odds of the fork eventually merging back into the main repo, which turned out to be a success story. So far I haven't had a need for a super-fast JSON parser, so my attention hasn't turned to this repo (I'm only here because @buybackoff mentioned me). But if you really like it and intend to use it in an app such that you'll keep caring for it, and if you can't get the author to respond, if I were you I would proceed with the fork and improve it. For the first while I'd try to maintain it in the spirit of the original author in case a merge in the future is possible. But after some period of time (e.g. when my fork becomes more popular than the original) and the author still hasn't responded, I would start taking it in my own preferred direction.

buybackoff commented 3 years ago

Thanks @AArnott for replying and MsgPack v2 background! I mentioned you in case you know something more by being in contact with the author and from MSFT :)

buybackoff commented 3 years ago

BTW, interesting tweet from today showing difference of @giraffe-fsharp in Techempower benchmarks when using Utf8Json. https://twitter.com/dustinmoris/status/1309133995796463623

Maybe @dustinmoris could add something from F# side and join the discussion if Utf8Json is important for Giraffe.

vilinet commented 3 years ago

@buybackoff Good stuff! And would worth considering switching/supporting Span.

dustinmoris commented 3 years ago

Hey, thanks for tagging me. Interesting thread.

It's a pitty that @neuecc hasn't responded here or to any other issue yet. I totally understand that there are many good reasons why someone doesn't have the time/passion/motivation to work on an OSS project any more, but at least responding to these threads would have been helpful :(

I just hope that it's only a complete lack of interest and not something else which prevents @neuecc from responding here. Wishing him the best of luck!

Regarding an official fork I agree with the advice given by @AArnott.

In order to support an official fork I would like to see the following things:

EDIT:

BTW I really like Utf8Json and I'm happy to assist with anything which needs to be done in order to get this thing properly off the ground. I didn't look into the ins and outs of this project enough to feel comfortable to maintain it myself, but I'd happily help in shaping the initial OSS organisation, writing some docs and helping with other things like setting up a GitHub Actions pipeline to deploy directly to NuGet on official releases and doing other "maintenance" work to help @vilinet or whoever wants to take the leadership of this project.

penguinawesome commented 3 years ago

i think @vilinet can initiate a clean fork from this repo and re-apply some of your changes again and let's start from there the official fork version of this repo? This library should have a new official fork version, this is a nice library for .net.

vilinet commented 3 years ago

@dustinmoris Very great comment!

So we could create this free organization and having this as one of its projects. We just need a 2 good name :) github organization name and a nice project name. I hope someone will come up with some good name.

dustinmoris commented 3 years ago

Re name, I think a new name might have more longevity than trying to version Utf8Json. The reason why because if in a year's time the fork has become the new official repo because @neuecc hasn't responded or stepped in since then, then the project will likely start taking complete new shapes which will over time have less and less resemblance with Utf8Json and in 24 months you'll regret that the library is called Utf8Json.V2 and has a NuGet version of v5.1.2 or something like that.

I would suggest something like ZeroJson because for three reasons:

vilinet commented 3 years ago

Sounds cool definitly. There is only a zeroJson user on github but it may not be that confusing. Also the organization can be ZeroJson?

buybackoff commented 3 years ago

i think @vilinet can initiate a clean fork from this repo and re-apply some of your changes again and let's start from there the official fork version of this repo? This library should have a new official fork version, this is a nice library for .net.

I think a lot of things in this discussion are for creating a fork for the sake of creating a fork. There should be some plan/roadmap for what the new fork should include and wants to achieve.

There are several existing forks:

'+ PR bug fixes.

One thing that would turn me off from the new fork is if there will be changes for the sake of changes, e.g. no improvements in performance or functionality, but beautification and/or refactoring, moving code around for "better organization". It's so easy to do that with VS/R#/Rider in one click, but quite soon merging would be a mess. That should be done only after the fork is proven to be independent and maintained by multiple people.

E.g. even if the project is renamed to XxxJson, it should be the very last step to change the namespace everywhere. That change is almost as bad as copying the source, despite all the modern merge/diff tools. For that reason, I think Utf8Json2 is not a bad name.

AArnott commented 3 years ago

I just hope that it's only a complete lack of interest and not something else which prevents @neuecc from responding here. Wishing him the best of luck!

That's a very kind thought. @neuecc is fine. He's active on MessagePack still, at least. I'm sure it's just a priority thing.

Keep the original README with a notice at the top that it is an official fork

I'm not sure what constitutes an "official fork" if you haven't got the blessing of the original repo author. Perhaps "active fork with several contributors" would be more accurate and still very encouraging to newcomers.

Start the official fork as part of a new OSS organisation on GitHub so that it can become a community project rather than a single person's project. It's ok if just one person champions it in the beginning, but it would make it easier to share the maintenance burden over time with more people and allow power transfers more easily in case we run into the same problem again. We don't want to have Utf8Json.V3 creeping up one day :)

Orgs can be abandoned if all the committers lose interest just as well, so I don't personally see that as a mitigation. Repos on personal accounts can have many collaborators as well, and if the sponsoring account's user loses interest, they can always transfer the repo to another user or org at that point, and github performs all the 301 redirects automatically so you don't lose traction. GitHub is all about open source, yet we don't see an org behind every repo. So I'm not against the idea, but it feels overkill to jump into it too fast.

I would also defer the rename of anything (nuget package, namespaces, etc.) till later. Invest in the fork, make it a worthwhile repo to maintain and tempting for the original author to merge back in. In the meantime, push with the original nuget package ID to some alternate feed somewhere (Azure Pipelines offers free NuGet feeds and I think GitHub does too) so you all can consume it. I would suggest you keep it with unstable versions (rather obviously forked ones, like 1.2.3-fork.5) until the ID changes or it merges back in.

AArnott commented 3 years ago

One thing that would turn me off from the new fork is if there will be changes for the sake of changes, e.g. no improvements in performance or functionality, but beautification and/or refactoring, moving code around for "better organization". It's so easy to do that with VS/R#/Rider in one click, but quite soon merging would be a mess. That should be done only after the fork is proven to be independent and maintained by multiple people.

+10 to this, @buybackoff

russcam commented 3 years ago

Thanks for the ping @buybackoff.

Elasticsearch-net: their work on the fork was visible, they were adding a lot of usability things, such as member attributes a la JSON.NET. Unfortunately, they copied the source from the fork to their main repo. Their fork with modifications is here, it has multiple PRs merged and some additional stuff. // cc: @russcam

The Utf8Json fork that the client is using is in

https://github.com/elastic/elasticsearch-net/tree/7.x/src/Elasticsearch.Net/Utf8Json

The separate nullean/Utf8Json repository was originally used and IL-merged into the client, but then it became easier to manage by including the source in Elasticsearch.Net project. The fork in the project is quite different in places to fit the client's needs, and contains additional fixes such as handling all allowed offset formats in ISO8601 Date string formats (i.e. [+-]hh, [+-]hhmm in addition to [+-]hh:mm), and removes components that are not relevant to the client's usage.

DamianSuess commented 3 years ago

So, I've been following this thread since the get-go. Is Utf8Json-v2 a go or not? The reason I'm asking if this v2-repo is still happening because it was created, then hidden, then public again, now it's gone again. Would love to see this project kept going strong 👍

gordrs commented 3 years ago

Since nothing has happened on this for a while and I have a need for it for my company and its projects, I've forked it.

Right now, I've favoured the use of netstandard and dropped the old frameworks. Executables and tests are targeting .NET 5.

I've already pulled in a few PRs such as #217 #129 #172 #193 and will be looking at bringing more in soon. Not sure when a nuget package will get built, what its version number will be, its name etc, but it will be soon. As such I welcome conversations from those with an interest in this project and need a working library. If you want to help maintain it, get in contact with me. Would like to build a consensus on what is the most pressing issues, get it all tested and working for everyone's platforms and sort those issues as they come in and then take it from there.