ooc-lang / rock

:ocean: self-hosted ooc compiler that generates c99
http://ooc-lang.org/
MIT License
403 stars 40 forks source link

String "notations" (CString, UTF-8 String) #377

Open showstopper opened 12 years ago

showstopper commented 12 years ago

We're dealing with multiple types of strings in ooc code next to our default, the "ooc string". Let's introduce a special syntax to declare raw CStrings or UTF-8 strings. Ie.

Raw CString: c"Hey, I'm a CString!" UTF-8: u"Ohai, I häve funny symböls."

These letters are only suggestions. Furthermore, it might be interesting to discuss if UTF-8 should be the default and therefore making the u"Unicode" type obsolete.

nddrylliog commented 12 years ago

utf-8 should definitely be default everywhere, so u is obsolete indeed.

rofl0r commented 12 years ago

"utf-8 should definitely be default everywhere" not on windoze :)

fredreichbier commented 12 years ago

+1 for making UTF-8 the default. Raw CStrings would mean zero-terminated char*s?

wandernauta commented 12 years ago

@rofl0r Indeed Windows has different standards internally (UTF16 IIRC), but it should be possible to work around those and just pretend everything is moonshine and roses (i.e. UNIX).

duckinator commented 12 years ago

iirc I saw commits about these. What's the status on this?

nddrylliog commented 12 years ago

The conspiracy branch has c"Raw strings ftw!"

alexnask commented 12 years ago

@nddrylliog Do you think I should implement those to master?

nddrylliog commented 12 years ago

I think conspiracy is the new master, it's simply not ready yet: people should patch against conspiracy, not master, and master should be kept (as 0.9.4) for stable projects for now.

conspiracy will be ready when CHeaderParser will be more stable, when it will output enough information for coke to function (and exist, ha). For now it justs spits a bunch of .c/.h files and a Makefile, that's not really acceptable for the end-user. Yet, it's much cleaner and I think we can remove a few features still (and rewrite others).

alexnask commented 12 years ago

Ok so for now all I will do on the master branch is fix bugs. I trust you will port some of these changes to conspiracy (if you want to I could do that too) ?

alexnask commented 12 years ago

@nddrylliog By the way, what do you have in mind for coke? I mean, what kind of information should rock generate for coke input?

nddrylliog commented 12 years ago

@shamanas If the fixes apply to conspiracy as well, it would be nice to backport them there, yep.

As for coke, mostly we'll need info for recompilation. The goal is that you should be able to edit the .c files and recompile pretty easily. Also, rock should be able to use info from the previous compilation round, only regen what's need and tell coke about it.

coke and rock would work hand-in-hand, with rock handling all the parsing+tinkering+generation knowledge, and coke handling the 'running external compilers' knowledge.

So actually it would go like this:

With a Cokefile you could define more things: e.g. how to build different libs, which deps to get (although that could simply be part of the .use file), which executables to produce, workarounds for compiler settings / platform-dependant stuff: all kinds of things rock itself will remain blissfully unaware of.

In fact, there's no reason why coke should have to wait for rock to have entirely finished to start compiling C files (multi-core / parallel ftw): we could start thinking about a way for rock and coke to communicate, and we should think about which features of ooc make it hard to really establish a 'dependency graph' between modules (I'm thinking maybe stuff like 'extends') Because once we have a clear dependency graph, it's really easy for coke to know what to build in what order and what to rebuild when a .c file has changed (again, something rock could forget everything about).

That's why rock should probably output a bit more than just 'B depends on A', perhaps it could go further, and just output in an easily readable format (JSON? YAML?) info about the memory structure of the various types we creates: this would allow the creation of Python/Rubyy/etc. bindings trivial as well (by third-party tools).

I hope you see where I'm going here, if not, keep asking and I'll keep rambling until it becomes clear.

wandernauta commented 12 years ago

(Minor note: GNU make doesn't by default run the 'all' target, but the first target, which is named 'all' by convention.)

alexnask commented 12 years ago

I see what your goal is. So basically a Cokefile will be usefile on steroids, with more info than simply package info and C compiler options, and coke will basically be what links rock and the C compiler.

I like the idea of compiling the C files as rock is generating them.

The idea of a 'dependency graph' seems nice but difficult to implement, as you said we have to plan this out perfectly.

When it comes to coke communicating with rock, I don't think it would be much of a challenge, as you said the perfect solution would be for rock to output information that can easily be parsed (JSON sounds fine), although I would implement a new option for it rather than -v (something like --info=JSON) and just launch a rock process from coke and retrieve its output to be parsed.

When you talk about 'memory structure of the various types we create' what exactly do you mean, apart from information on its fields, methods and generic types, or do you mean those ?

nddrylliog commented 12 years ago

@wandernauta Ooh, correct. Thanks for the note.

nddrylliog commented 12 years ago

@shamanas Well, I think we can draw the line between a .use file and a Cokefile.

A .use file will definitely be parsed by rock itself, to figure out the dependencies between different libraries/projects. Perhaps it should be separately parsed by coke to know which C flags to use? After all it doesn't make sense to make rock aware of C compiler flags at all, and it makes sense to be able to tweak C flags in .use files even after rock is done generating .c files of its own.

.use files will also be used by package management programs (maybe coke should play a role there? like 'prepare' being a default target that checks everything is in place. from the info in the .use files?), like nirvana/reincarnate used to do (and do again, apparently?)

As for Cokefile it's mostly just for specifying 'targets', and more high-level stuff: if we need to download assets from elsewhere and uncompress them somewhere, that's what you specify in a Cokefile. If we need to create an archive and upload it to some FTP server, it goes in the Cokefile as well.

If your single source repository contains 4 different libs (each with their own usefile), it's your job to create 4 tasks specifying which .use file to use to generate a library, and what name the library should have, in which directory it should be generated, etc.

nddrylliog commented 12 years ago

(If you receive an incomplete e-mail notification for the above, go read it on GitHub, I screwed up on C-Enter and had to edit)

nddrylliog commented 12 years ago

For the dependency graph, as a matter of fact rock contains most of that info: the sequence driver uses that info to know what to recompile between runs (the infamous libcache-ing that caused so many problems). It's just that it's hackish for now, and there's a little bit more to it than just 'importing a module means you should be recompiled if it changes'.

In fact, the only reason to recompile a .c file that hasn't changed is because of Fragile base class due to the use of vtables.

One interesting alternative, which nobody does as far as I know, is to completely trash C's "include" mechanism and only include headers from C libs: as for ooc libs, we would simply re-output the struct definitions wherever needed. I wonder if this would make recompilation faster (right now it's my impression that C compilers re-parse a lot of stuff they don't actually use).

Plus the only times where we need to know the object structure is when we directly access a field, or when we inherit the class (and thus define a sub-structure).

This would have the added advantage to completely do away with the -fwd.h / .h duality, and the general worry with forward declarations.

alexnask commented 12 years ago

@nddrylliog Ok I get it a bit better now ;) Yes I definitely think coke should read use files as well as you said, for basic information. I definitely think coke should use reincarnate if Cokefiles can be used for package management too. Will Cokefile be similar to a Makefile in the way you can execute any commands or will it be comprised from a set of instructions? I guess in the first case there is no need to talk about such things anyway, as you could access external ooc tools like reincarnate that way, without coke being specifically "linked" to them in any way.

alexnask commented 12 years ago

That is a really interesting idea (scraping the include mechanic). I hadn't even thought of such a thing to be honest. I guess this would work fine and could be considered as a form of "dead code elimination" in the sense that any un-needed type would not have any code generated for. I think we could experiment with that :)

nddrylliog commented 12 years ago

Note that the rationale for removing (most of) include use in C code generated by ooc is mostly that it's so terrible. First off, I wouldn't even consider using include without header guards (otherwise once your code gets a bit complex it becomes utterly worthless). And even then, we always run into weird bugs.. stuff about order, double declarations and so on. We already know everything we need to know about dependencies from within rock, so we know exactly what a given .c file needs.

Another thought: I don't see why a single ooc module would map to a single .c file. If we define 12 different classes in an .ooc module, chances are if we recompile we'll won't have changed all 12 classes. And having 12 different files for small classes seems a bit silly. So why couldn't we generate mymodule_ClassA.c etc., which would themselves have dependencies on each other (as generated by rock), and which coke would know exactly when to recompile?

The last issue with incremental recompilation is with randomly generated names, such as tmpclosure__23498: right now the seed is shared for the whole compilation process IIRC, so changing a detail in file 'C' could very well change the variable names in files 'D', 'E', 'F' and so on.

Imho, this should be fixed by separating as well as possible the ooc AST and the C AST, as is done in oc, but not yet in rock unfortunately. That's quite a bit of work though, hence my desire to reduce the number of features first :)

Re Cokefiles: I think reincarnate should use .use files only. However, uploading to nirvana could be a coke task. See? .use files should contain enough info for reincarnate to know the deps, the name, the description etc.

As for allowed commands, I'm still not sure what should the format be exactly. See Rakefile for examples, they're actually just a Ruby DSL: but it doesn't really make sense for a Cokefile to be ooc code that is compiled.. it could be ooc syntax that is interpreted but what then: interpreter backend for rock? subset implemented in coke itself? Looks messy.

Maybe it could have a few built-ins, and then if you use external commands and the task is being run as root, coke will warn you (like: You'll be running the task 'rm -rf /*' as root. Are you okay with that? [Y/n]).

wandernauta commented 12 years ago

Compiling a Cokefile to a (hidden) binary would make the most sense for me, I think. Combined with a DSL-like something it could be pretty sweet.

all: func {
    widget()
}

widget: func {
    /* Build foobar widget */
}

clean: func {
    file("widget") delete()
    files("*.o") delete()
}

Running coke would then be like running rock -run +-include coke.h -entrypoint=coke_main -q -outpath=./.coketmp/ or something.

alexnask commented 12 years ago

Well coke will have to interpret some kind of code from a Cokefile anyway, a subset of ooc or something similar sounds fine to me. As in, a "Target" singleton class with do methods for example something like:

Target add("all", |task|
    task compile => "file.ooc"
    task execute => Target get("clean")
)

Target add("clean", |task|
    task delete => "directory"
)

/*
TaskType: enum {
    Compile,
    Execute,
    Delete
}

Target: class {
    task := static Task new() // Task to be passed around to functions

    add: static func(String, Func(Task)) {
        // Do stuff
    }

    get: static func(String) -> Func(Task) {
        // Do stuff
    }
}

Task: class {
    compile := TaskType Compile
    execute := TaskType Execute
    delete := TaskType Delete
}

operator => <T> (left: TaskType, right: T) {
    // Do stuff
}
*/

Edit: I guess compiling to a hidden executable would be fine too but I guess we lose a good bit of speed / execution time and it turns coke into "just another ooc library, but to manage rock compiling"

alexnask commented 12 years ago

One cool "side-effect" of using hidden executables to do this stuff would be the ability to perform platform-related tasks using version blocks.