nitely / nim-regex

Pure Nim regex engine. Guarantees linear time matching
https://nitely.github.io/nim-regex/
MIT License
224 stars 20 forks source link

Your regex package is referenced in the Nim book now #112

Open StefanSalewski opened 2 years ago

StefanSalewski commented 2 years ago

For the regex section we have decided to base the introducing explanations on your regex module instead of the re and nre modules of Nim's standard library.

Please let us know if you consider some of the explanations wrong or unclear. Please note that that section has not yet been checked by one of our proof readers, so it may contain grammar or spelling errors.

Maybe you can comment on

For a successful match we can access the capture with the group() function(, where we have to specify the index number of the capture, and the actual text string that was used for the match. The fact that we have to specify the initial text may look a bit strange indeed.

Is there a good reason that group() gets passed the whole string again as last parameter? Seems to be inconvenient and error prone, can a reference to that string not be saved in the RegexMatch instance when it is passed to match() ? I think some other regex engines do not use that passed string for the captured groups.

Best regards, Stefan Salewski

nitely commented 2 years ago

I'd say, don't use that API. Instead of m.group(0, text) do text[m.group(0)[0]] which at least you can explain as "m.group(0)[0] will return a slice of boundaries for the first repetition of the first match group". m.group(0) returns a seq of boundaries for all group matched repetitions ((...)+), not just the last one as most regex engines do.

This feature will be removed in V2, and so it will that API, see #111

I think some other regex engines do not use that passed string for the captured groups.

Yeah, most regex engines return a substring. I'm debating with myself whether to do that or not. I wish Nim string views would be done, but the last time I checked they were experimental and had a bunch of bugs/edge cases.

StefanSalewski commented 2 years ago

Thanks for your kind reply. Actually I forgot to link the books content yesterday, but I think it was not that difficult for you to find it, if you did not already know.

https://github.com/StefanSalewski/NimProgrammingBook http://ssalewski.de/nimprogramming.html

nitely commented 11 months ago

JFYI I've implemented the new API. The old one is deprecated and prints deprecation warnings. The changes are basically:

re"regex" -> re2"regex"
Regex -> Regex2
RegexMatch -> RegexMatch2

Since only the last repetition submatch is returned, text[m.group(0)[0]] becomes text[m.group(0)]. There is no group(MatchRegex2, int, string) because I think text[m.group(0)] is more obvious.

StefanSalewski commented 11 months ago

Thank you for the hint. As I have to add a few more remarks about Nim 2.0 to the book, I will then fix the section about your RegEx module as well. The book has already its own website, and PDF versions are available as well: https://nimprogrammingbook.com/ English grammar should be mostly fine also, so I could let it print at Amazon or elsewhere.

StefanSalewski commented 10 months ago

Have you considered instead of using all these symbols like re2, Regex2, and RegexMatch2 to create just a new package called regex2, and then avoid all the symbols with appended "2"? It is not only the ugly appended 2 for all the symbols, but also many deprecated functions listed in the API docs, which makes the API docs look a bit verbose and polluted. The problem which I can see with a regex2 package is, that for someone with a very large software project, it may occur that some used packages still use the old regex, while others may then already use the new regex2 package. But that is very unlikely. And for your current solution, I can see no upgrade path, the appended 2 will still forever? Well, it would make sense, when you are planning many more API changes, so that we will have RegexMatch3, RegexMatch4, RegexMatch5.

For my Gintro bindings, I decided to have a gtk and a new gtk4 package. The user can import one of them. Then in each case, there are types like Button, but not Button3 and Button4. For my CDT package, I did it similarly: First there was the CDT package, a textbook implementation. Then, when using it, I discovered that a more OOP API would be useful, and I created a modified CDT2 package.

Well, all that is difficult. And I have to admit, that I have used Nim only very rarely in the last two years and I have no more direct contact to THEM, so my feelings may be wrong.

nitely commented 10 months ago

I didn't give it serious thought. I guess I just don't find re2, RegexMatch2 ugly. If they were a lot more symbols then yeah, a new package would definitely be better. Also, there's already re in stdlib, so re2 may be less confusing. There is also google's RE2 lib, so I didn't find the name alien.

If I ever break the API again, which may happen once string views are ready (maybe never, who knows), if they are better than slices. In about a year or two I'm gonna remove all the deprecated APIs anyway, so I can probably go back to using re, RegexMatch by then.

I agree about the polluted docs. I wish there was a way to just not gen docs for deprecated stuff. But it's only temporary.

StefanSalewski commented 10 months ago

I agree, it is not a serious problem, two or three symbols with an appended 2 is OK. Actually, creating a new package has the disadvantage, that when fixing a bug or adding new functionality, we have to do it for two packages. That is the issue for my CDT and CDT2 Delaney triangulation.

In about a year or two I'm gonna remove all the deprecated APIs anyway, so I can probably go back to using re, RegexMatch by then.

But then all users of your package will have to modify their code also? And I will have to fix my book again? Maybe then it is already printed at Amazon? So I would vote to keep the appended 2 forever.

I will try to update the section about your package in the book soon. Perhaps you have seen, that the book has now also a section titled "Parsing data files (in parallel)" where your regex package is compared to many other ways to parse text files, including a table comparing the performance.

nitely commented 10 months ago

But then all users of your package will have to modify their code also?

They already have to. Deprecation is so they have time to do it, before those APIs get removed at some point. I don't think Nim 2 keeps stuff that was deprecated in Nim 1, right?

And I will have to fix my book again? Maybe then it is already printed at Amazon? So I would vote to keep the appended 2 forever.

Is your book never gonna get a second edition? at some point Nim X code (1, 2 whatever version) is not gonna compile with older code versions. Same with regex. It may occur in 2 years, or 4, or 5, that's fine, but it's going to happen eventually. Your book will be outdated at some point, I imagine. Granted if you want your book to always work for the specified Nim version, you can do the same for regex, or put a note about the version used for the book. FWIW, when deprecated stuff gets removed I'll bump the regex version to 1.0. That could help if you want to specify the version in the install command like nimble install regex@"<2" (v2 will be the string view version, if that ever occurs).

nitely commented 10 months ago

Perhaps you have seen, that the book has now also a section titled "Parsing data files (in parallel)" where your regex package is compared to many other ways to parse text files, including a table comparing the performance.

ye, I need to investigate why regex is so slow for that code sample.

StefanSalewski commented 10 months ago

Is your book never gonna get a second edition?

No, never. Comments like

https://forum.nim-lang.org/t/10101#66729

from "THEM" has killed all fun of Nim for me, and for many others. There have been so much unfriendly behaviour of them all the years. Can you remember how Picheta has banned Disruptek permanently two years ago? Or how they fired Timothee and Krux? And it has not become any better, see https://forum.nim-lang.org/t/10312#68553 as a recent example. I came to Nim in 2014, but now most of the bright and friendly people have left. I still like the language, but have not found motivation to use it in the last two years any more. As the grammar of the book is mostly fine now, I think I will do a final proofread, and then let it print. But it is obvious that nearly no one will buy it. They have always talked bad about the book, and refused to mention it on their page. Well, people may find it by google search, and when it is printed at Amazon. But my feeling is, that only very few people have the motivation and energy to learn Nim in these days.

nitely commented 10 months ago

That's sad. I left Nim community around that time as well for the same reasons. But I still like the language as well, and I'm fine with not being part of that. The language adoption may still grow because of non-english communities, and companies picking it up, or remain niche, who knows.

StefanSalewski commented 10 months ago

You are right, the language is fine, and we can use it while staying away from their forum and IRC. Some people do that. Some may use the skull-nim fork instead. But my feeling is, that the number of Nim users have drastically decreased in the last three years. Nim 2.0 release was not really noticed in the public -- no one really cares. Incremental compilation and CPS support is delayed again, they start from scratch again with IC. My personal feeling is, that Nim started in 2005 as a one person hobby project, and is exactly that now again. People know the bad behaviour of Rumpf, as in https://www.reddit.com/r/nim/comments/ywxsbz/this_is_disappointing_to_read_coming_from_the_nim/. Also see https://lobste.rs/s/t7oyxa/nim_v2_0_released

So I have not much hope for Nim any longer. There is no more excitement for a 15 (18) years old language. Nim is still in the Tiobe top 100, but my personal feeling is, that Nim may have about 100 active users still, and maybe a few hundred like me, who still like the language, but rarely use it. And in the last three years, I have seen nearly no new users who have really started learning Nim and use it. For new serious GitHub projects the situation is also sad. I just had a conversation with someone from India, there Nim is not popular. For China I am still usure, some years ago there have been rumors that there was some Nim activity in China. And now there is the new Mojo language, intended as a very fast Python for AI applications.

StefanSalewski commented 10 months ago

The IRC users just had a longer discussion about the current state of Nim:

https://irclogs.nim-lang.org/15-08-2023.html#21:56:19

21:56:19 | FromDiscord | <_gumbercules> Nim has a history of driving users and contributors away,

Seems to confirm my own observations.

SolitudeSF commented 10 months ago

Dont worry, guys. Nim wont die no matter how much Araq tries to kill it.