terralang / std

A repository to store code implementing common features, algorithms, and datastructures that are generally useful to terra code
MIT License
21 stars 7 forks source link

What would you include in a tier-0 lib? #3

Open capr opened 5 years ago

capr commented 5 years ago

I'll list a few things roughly in the order of how much I use them in my code. I've only written about 7K of Terra code total so far, so someone with more LOC written would probably have a better list and a different order, so FWIW:

These are all in my low module, which begs the question -- what should we name this module? :)

aiverson commented 5 years ago

What exactly do you mean by tire-0? Did you mean to write tier-0? And what exactly does that mean to you?

capr commented 5 years ago

Yes, bad spelling.

capr commented 5 years ago

It means the most basic tools that could almost be considered language extensions and which should really be built-ins but aren't, like the ones listed above, but without which the code becomes tedious and redundant i.e. writing if then else instead of an iif expression or a.b.c = a.b.c + 1 instead of inc(a.b.c) or memset(t, 0, sizeof(T)) instead of fill(t), etc.

aiverson commented 5 years ago

We need better debugging tools. The ability to do breakpoint debugging is really useful, but I don't see anything that currently gives us that ability. Building better debugging tools is a longer term project for later that probably requires more discussion. For a tier zero standard library, I have a few things that I definitely want in it.

IO facilities

printf and scanf from the C stdlib are workable, but they aren't great. They have security problems, the need to parse the format string at runtime is inefficient, they can't handle anything very sophisticated, and they aren't easy to metaprogram. I think writing a set of formatting and parsing IO primitives that have their format in a structure that can be efficiently metaprogrammed and resolved at compile time, that are more flexible for simple but nontrivial formats. For example CSV has quoted strings with escape characters. CSV should be dead simple to parse with stdlib tools, but C scanf can't handle that at all, which requires a manual FSA parser to deal with CSV. Making a parser do that should be simple and fast. And in Terra, it should be metaprogrammable. Same for formatting it out. I have some sample code relevant to this.

Iterators, transducers, and streams

Flexible and powerful filter-map-reduce operators should be available. I have workable implementations of these. Iterators over the basic collections should be available and tools to build new iterators should as well. I have a basic implementation of this. Streams for potentially asynchronous data and events are also useful and should be usable with the same API. I do not have code for this. These parts may not be appropriate for tier 0.

Memory management

Using malloc and free is acceptable if we want to only run code in environments with libc. WASM, which I am interested in using, does not have malloc and free. Microcontrollers running baremetal would not always have malloc and free. It may also be useful to have support for alternative allocators to do things like FIFO allocation, fixed size allocation, or other optimization scenarios. This needs to be architected in to the standard library so that it becomes the default way of doing things and because the standard library needs to support pluggable allocators for places where it allocates memory. Not all of this is in scope for tier zero, and all the alternative allocators are certainly not.

aiverson commented 5 years ago

I would rather use cond than iif as the name for the ternary operator. We could actually borrow Lisp's semantics there a bit more and let it have any odd number of arguments to handle the equivalent of if elseif elseif elseif else chains.

aiverson commented 5 years ago

A function to synthesize comparison metamethods out of some minimum spanning set of comparison metamethods would be useful too. If we could come up with a reasonably general comparator synthesis tool, that would be useful, and I think I have an idea for making that.

capr commented 5 years ago

@aiverson can you post here some links with the stuff that you already have implementations for so we can take a look? thanks!

capr commented 5 years ago

@aiverson can you make a top 5 or a top 10 list of your most used functions across a variety of problem domains? That's what I would start with for a tier-0 lib.

aiverson commented 5 years ago

Off the top of my head, some of my most commonly used functions and datastructures are, in no particular order:

Repetitive code snippets that are amenable to metaprogramming

aiverson commented 5 years ago

Here is an implementation of some basic filter-map-reduce capabilities. https://github.com/aiverson/constellation/blob/master/src/fmr.t

It would need to be cleaned up a bit and extended to integrate with the rest of the standard library better before being integrated into the standardlib, but it should be a good start.

There are also some performance optimizations that I have thought of since writing that that I can implement.

capr commented 5 years ago

That's great for a functional module, I'm just not sure we're on the same page here with regards to what tier-0 means and why it's needed. Also, IMHO a functional library would be quite limited in a language that doesn't support closures. For functional programming maybe adding closures to Terra would be a good research project (even in limited form like GCC's nested functions).

aiverson commented 5 years ago

Something that optimizes and simplifies loops, one the most commonly written parts of any program, seems like something that should be part of the very first bits of the standard library. Recursions schemes can wait because most people (including myself) don't have a good idea of how do design with them. Filter-map-reduce and code that can be replaced by them though is something that seems about as commonly used as the inc, dec, or cond macros you mentioned and code that can be replaced by them. Loops that can be made out of an FMR construct are common, and my library synthesizes efficient code for them while separating concerns clearly and making the programming less error prone. They certainly aren't that much less common than incrementing a variable, and are much more common than even a priority queue. This sounds like what you were describing as a tier zero library.

capr commented 5 years ago

Well, we'll just have to agree to disagree then :) Like I said, the probability of two people to see eye-to-eye on such matters is pretty low. That being said, I'm willing to be proven wrong so here's my null-hypothesis: if you have terra code where you solved an actual problem and used this style a lot, I'd be very glad to learn something and upgrade my style.

aiverson commented 5 years ago

Good point. I'll work on this in the background and bring it up again when I have a nice compact example of doing something practical with it that I think demonstrates the value well. In the mean time, we should talk about the rest of the first section of the standard library to be implemented.

I think the first thing should be a collection of basic container types. Ideally, we would have some idea of allocators from the word go, but that is an architectural consideration that may need to be done wrong before we can figure out how to do it right.

Parsing and formatting is something we should work on as well. My desiderata for parsing and formatting are: metaprogrammable easily (it should be clean and reasonably legible to define a function that takes a struct definition and synthesizes parsers or formatters for a specific representation), secure (no (accidental?) pointer overrun or format string injection vulnerabilities like in C), efficient (the runtime shouldn't need to parse the format string in addition to parsing or serializing the actual target. Ideally there would be an option to choose between optimizing for code size and code speed), flexible (the formatter and parser should allow reading and writing fron memory buffers, streams, files, sockets, strings, etc. in a smart and extensible manner), and expressive (It should be possible to parse a lot of things without too much difficulty.) The obvious solution to straightforward parsing is a Parsing Expression Grammar. LPEG provides a pretty good example for this. If we implement something very similar to LPEG that compiles into terra native code and include some convenience functions that produce patterns like the C scanf specifiers for convenience, and maybe some terra specific capture operations and semantics, that should be a suitable solution for the parsing half of that problem. The C printf/scanf pair has an appealing symmetry between the format strings for reading and writing, and ideally our solution would preserve that. I have some ideas, but I would like to hear other ideas to brainstorm.

TWarszawski commented 5 years ago

I just wanted to add a comment that as a user (through Regent), I would have found a vector/arraylist and hashmap useful. I ended up implementing my own.

aiverson commented 5 years ago

There is already a basic vector/arraylist included in the core library. We are definitely going to have an improved arraylist and a hashmap in the library among other things. A nice variety of container types seems like a very important feature of a standard library, I've got some plans to make it as good as I possibly can. If there is anything special about your implementation or usecase, please let us know about that so we can use it to inform the design.