Decide on who's the target audience of the project

I post this as a first issue to decide in regard to https://github.com/windelbouwman/ppci-mirror/issues/10 . What is an "improvement" (and what's not) depends largely on who's target audience.

For example, it's possible to decide that end users are the target audience. But then the README should not contain "frightening" stuff like:

Warning

This project is in alpha state and not ready for production use!

And it stead should contain stuff along the lines of "Drop GCC, drop LLVM, start use PPCI now, we have cookies!"

Or it can be decided that WebAssembly can be a selling point piggybacking on which PPCI could launch into masses. And then README should provide instructions how to build some cute display hack and open it in a popular browser.

The examples can continue, let me come up with a specific proposal: target audience should be Python developers. "Developers" definitely, as README itself says that the project is in alpha stage. "Python", because I doubt that many non-Python developers would jump to see another compiler out there. The project should be of most interest to folks who know Python and curious what can be done using their favorite language.

However, any instruction in the README should also be friendly to folks who aren't (much) familiar with Python. This is not contradictory goal to the previous paragraph. For example, I'm good enough familiar with Python, but I still don't like to "install" something right away (indeed, that's not related to Python and applies to any software). Fortunately, it seems that PPCI is usable from just a git clone. But instruction should e.g. refer to python3 -m ppci cc instead of ppci-cc, because the latter appears only after "installing" it.

Criticism/other ideas are welcome. I anyway wanted to post this, to give a context to other sub-tickets I may post for https://github.com/windelbouwman/ppci-mirror/issues/10. (Again, based on own experience - I get various suggestions for improvements in my projects READMEs, and half of the time I wonder why they think it would be an improvement.)

Currently I think the target audience should be somewhat embedded software engineers how are playing with python, and feel brave to try this stuff, or want to script / automate / parse some C code as part of their build process.

Another group might be compiler developers who would like to try out a new idea quickly.

In fact, I'm unsure who would really use this project for something serious, it's mostly out of curiosity of what can be done with python.

Any other thoughts on this?

Thanks for the reply here. IMHO, this is one of the most important questions right now, as it sets the further direction of the project, and would make it clear to potential contributors what would construe a useful contribution to the project and what would not.

Let me start with commenting to your replies.

Currently I think the target audience should be somewhat embedded software engineers

That's peculiarly formulated ;-). I guess the main word there is "somewhat embedded". Because one should picture a typical embedded software engineer (as in "engineer of software for deeply embedded systems, i.e. microcontrollers") as a guy sitting on Windows 7 (because Windows 10, Linux, Mac don't support drivers for his hardware probe), and holding his IAR C. In no way this guy will be interested in Python, PPCI, etc. Ok, maybe for "somewhat embedded engineers" it would be different, but there would be very few of such "somewhat embedded engineers", which makes almost zero target audience.

Another group might be compiler developers who would like to try out a new idea quickly.

Definitely +1, but these are largely served well by LLVM. So, would need to define which subgroup may still be interested in PPCI, and how to make it more interesting to larger number of these folks beyond that.

In fact, I'm unsure who would really use this project for something serious, it's mostly out of curiosity of what can be done with python.

I kinda suspected it might be like that, and thanks for spelling it out. And there're many projects like that, I dumped https://github.com/windelbouwman/ppci-mirror/issues/33 in preparation to this reply, and that lists only "hobby" compilers in Python, and for C. There're many more compilers in Python for other languages, and more compilers for something written in other languages.

But PPCI has got some "problem" - it has too wide scope - multiple source languages, multiple output (byte)codes, SSA, linker, build system, etc. - all packed together. But not only wide, it's also deep: while one needs to do some fiddling, one can see all those pieces actually work (not completely and not fully, but work). And the final "terrible news" is that all this work was done largely by one man - yourself @windelbouwman. Which is again shows the power Python brings to mere people. So, what we have is compiler infra on the brink of being useful, done by mostly one man. Just imagine where that can go if more people get involved, and how it can affect other people. Some people just can't sleep well thinking about all the potential, hence such tickets ;-).

In the next comment, let me elaborate a bit on that.

So, let's think of who can be target audience of open-source project. While discussing that, I will be giving valuations from the point of view of "community" and from a point of view of "overall progress". Obviously, I'm just a single person, and probably can't represent a "community" (which may not even exist, as in "nobody gives a damn about all this stuff"). A notion of "progress" isn't exactly objective either. With that disclaimer out, let me still try to define:

The mission statement: The state of the art in advanced compiler hacking is object-oriented API in C++ with LLVM, hopefully being able to structure that as a "plugin". That's noticeable improvement over previous situation of GCC with much less cleaner C API and explicit political prohibition of plugins. But current state of affairs is still represents "extraordinary effort" for many people who are interested in compiler hacking, but are short of time resources to learn LLVM and maintain C++ stuff. We'd like to improve that largely with compiler infrastructure implement in Python.

Ok, types of projects by intended target audience.

The target audience is the (usually sole) developer of the project. This is worst case from community perspective. The interest here is not to improve state of the art, but merely learn things which isn't known/familiar to the developer of project. Again, this is the worst case from perspective of community contributors, because any contributors would be ignored, or rejected, because the author isn't really interested in somebody else's ideas, only his own. And these ideas are usually in their own head and he simply don't have enough time and interest to discuss them or other ideas with other people. @windelbouwman, I hope you never wanted to make a project like that. Because otherwise, you failed miserably. Failed by overdoing it - again, your project is on the brink of being useful to a wide community of people.
With the second type of project, believe it or not, the crux of project is still main author's own ideas, but target audience is explicitly the community, theatrical style. This is latest fashion, and is great new type of geeky entertainment. One common trait is that these try to be "a whole world from scratch", based on some NIH approach - because of course what we already have sucks, and they can do it better. https://github.com/pervognsen/bitwise is the proverbial example of such a project. From README: "I've always been obsessed with how things work under the hood", "Here are some examples of what you will learn to build:". So, the guy is obsessed, but we will learn. Sounds good, bring me popcorn. Turns out, we start learning with inventing own language, Ion, because it ain't cool otherwise. Another example is https://github.com/akkartik/mu . Did you know that you can do structured programming in machine code, and that it's the future of programming? That guy does it. Again, such projects provide a great value for community as entertainment (and education of course). Beyond that, perspectives are mixed. What we can learn from it, that it's great to make NIH tools? Community never had problems with that. @windelbouwman, I hope you smell my fears - I suspect that current PPCI project may fall into this category :-(. Because own build system, XML based in late 2010'es, really? C3-the-your-own-language, really? Then it would be interesting to know whether being in this category was conscious or not conscious "decision". In the first case, everything is doomed. And in the later case, I do all this writing to sway the project in the next category.
Projects in this category ration missionary activity. They usually have a specific aim (which can be not explicitly pronounced/conscious too), and don't try to "fix the whole world" (only if by implication, but not rebuilding it from scratch). Quite often a case is "let's do thing like X, but with a), b), c) (finite list) changed". Many projects are like that, e.g., I'd say RustPython is "let's do the same bloat as CPython, just in Rust", MicroPython is "Python is good, but too much bloat, let's pluck it", my own Pycopy is "Hey MicroPython, stop betraying "micro", that's not minimal enough". One good thing about such projects is that their target audience is potentially users of the entire "original" project, and there's a good cross-pollination among them. We would also put LLVM into this category, but the truth is that it hardly ever was in it, and always was in the next category.
Corporate open-source projects. These are oftentimes rooted in politics and corporate interests. Let's not skip GCC which is a compiler of GNU-the-corporation with agenda of precluding commercial entities to benefit from community work without returning back to community, or just violating community interests. Next example in the queue is LLVM, which is a response from commercial entity to GNU concerns. Etc, etc. These projects are definitely in the interest of (loosely defined) community, as they're well organized and oftetimes funded, so work on "boring" matters like testing, thorough compatibility, horizontal scalability (e.g. many architectures), etc. The community of the projects themselves may be however "astroturfed", and it may be too hard for a "mere human" to contribute to such projects, or make them work in their own (vs corporations') interest.

That's it. Don't get me wrong - there're no "bad" types of projects in that list, and no "best". For different projects, one or another model may be suitable, and that may change over time. A professional software developer definitely should try all of them. And again, p.4 is not the "toppest" one. https://github.com/pervognsen of Bitwise mentioned above is a good example, by his account, he had so much fun of corporate programming, that it took a leave on his own to recover a bit of his life and convey a message to community with project of type 2.

Anyway, back to PPCI. I would humbly suggest that for PPCI, project type 3 is the best. @windelbouwman, would you agree?

I think this project has moved in type 1 and type 2. At first, for me this was just a learning experience, and at some point I put some effort in cleaning the code and creating some documentation, since I realized it might become useful to other people. The way you outlined it, type 3 sounds reasonable, but I would like to keep the scope of the project wide. Not fix the whole world, but have a consistent library which can be used to deal with compilation related problems.

I fully understand that the project is wide, and not a classical unix tool ("do one thing, and do it well"). It want this to be a broad project, since them all parts can work nicely together. Off course, this might result in a big ball of mud in which there is a lot of stuff which all does almost work, but I'm willing to bet on this. I've seen too many projects split up into several repo's and subprojects and then configuration management takes over. I would like to think of this project in the lines of sox and netcat, tools which are a sort of swiss army knife for a specific topic.

Btw other target audience I thought of (to prevent the target audience from becoming equal to the empty set :)):

Software archeoligists, people dealing with really old software for which compilers cannot be used anymore
As a bootstrapping tool for either a very old language / compiler or for a new language for which a first bootstrap compiler must be developed.

Btw, I'm glad this is not a type 4 project!

@windelbouwman, thanks for the reply!

The way you outlined it, type 3 sounds reasonable, but I would like to keep the scope of the project wide.

Thanks for acking that "type 3 project" sounds good, at least I finally explicated what's the reasoning for the changes I already proposed, and hope to propose even more. And it's absolutely fine for project to be wide - indeed, compilation itself is very wide subject, so a project dealing with it won't be exactly as simple as unix "cat". But already mentioned my concern in https://github.com/windelbouwman/ppci-mirror/issues/29 - you never do "something", you always do "something instead of something else". And that's exactly my idea - to call for down-prioritizing work in some areas and prioritizing work in other areas, often last-mile ones (where simple enough changes can lead to overall vast progress, at least re: community adoptability).

Not fix the whole world, but have a consistent library which can be used to deal with compilation related problems.

Again, good. But then it's a matter when to set fence posts. For example, for me, build system isn't really related to "compilation problems". It's completely different area of generic dependency tracking and task scheduling. And ppci-build is the part I'm most skeptical about (as in: I'm not interested in using it), and would like to propose to downplay it, and up-play PPCI use with other build systems. Anyway, that's just a specific example.

Btw other target audience Software archeoligists, As a bootstrapping tool for either a very old

That's still too niche, still not what I have in mind. Let me finally formulate my ideas in the next comment.

Ok, as we agreed that "type 3" project makes sense, the main matter is of course implications that has on project approach. Let me summarize what I have in mind:

What's one of the most known/used languages? C? Does PPCI support C? Yes? Then let it be that any C programmer be in our potential target audience. (That's mostly "statement of intent", but even it has its implications, we should make thing familiar to C programmers, and not make things not familiar. E.g., C programmers usually don't see error messages like in #9, so we shouldn't have either).
What's the mostly widely C compiler? GCC? Then let any GCC user be the actual target audience of the project. Like, and GCC user should be whole-heartedly welcome to try PPCI, and by "whole-heartedly" I mean "we should make their experience positive, and motivating to try further, rather than run away in awe". There're concrete and far-reaching implications of this clause, if following points.
E.g. all tools provided by PPCI should be structured following GCC, and have compatible arguments, command-line options, etc. Clang learnt that in a painful way, that few people took them seriously until they acquired most of GCC-compatible command-line options, inline asm quirks, etc. (And they're still not there, and that limits their adoption.)
On the other hand, project would rather concentrate on the scope corresponding to GCC and Binutils to avoid stretching too thin and going to no-man's land. One example is build system - it's good that PPCI provides own, but shouldn't be enforced (as in: there should be enough examples of using ppci without ppci-build, and actually such examples should be promoted, to not scare people with NIH things).
In general, interoperability is important. One particular area is interoperability with tools similar is scope. E.g., I believe there're some beginnings of dealing with LLVM IR, and that's one important area which fully makes sense to be elaborated further.
Similarity, using existing good libraries which adhere to project constraints (like, being pure-Python) may be a good thing, instead of trying to develop everything from scratch (especially in "boring" areas like lexing, parsing...)
With all the above in mind - that project should cater for wide audience of C/GCC users, the project is also in unique position to target other audiences, so they shouldn't be overlooked.
Python is well-known for its beginner-friendliness, so if someone will want to hack on compilers without knowledge of Python, let's make PPCI to be non-sucky platform to start with. That means things like https://github.com/windelbouwman/ppci-mirror/commit/726dea002d8d54afcf2aa966a130e55fbaac1294 , and just writing docs (at least intros/basic usage) in a way to not rely on intimate Python knowledge.
One interesting question re: beginners is terminology and state-of-the-art quirks, inherited from big project like GCC (which in turn inherited it from decades of computing industry history). My IMHO is yes, that we should teach them real terminology, which they can reuse outside the PPCI sandbox. So, linker should be called "ld" and not something else, and uninitialized data segment - BSS. (But having a glossary may be helpful ;-) ).
Then all other niche target audiences mentioned above - embedded engineers with non-standard hardware, computer archeologists, etc. They should be covered by: a) being compatible with and interoperable with existing tools, and b) good documentation.

windelbouwman / ppci

Decide on who's the target audience of the project #11