pawn-lang / compiler

Pawn compiler for SA-MP with bug fixes and new features - runs on Windows, Linux, macOS
Other
306 stars 72 forks source link

Structures #382

Open Sasino97 opened 6 years ago

Sasino97 commented 6 years ago

Issue description:

My suggestion is about implementing compiler support for structs. I am aware that this way, any new code using structs will not be compilable with the default PAWN compiler included in the SA-MP Server package, but I believe it is worth it.

Suggested PAWN syntax:

I suggest to make its declaration syntax similar to the PAWN enum syntax:

struct User
{
  Id,
  Name[24], // comma
  Score,
  Float:Health
}

This would be equivalent to this C code:

typedef struct sUser
{
  int Id;
  char Name[24];
  int Score;
  float Health;
} User;

As for variable declaration, PAWN does actually not have types, but it has tags; hence this could syntactically work well with the already existing tag syntax:

// example 1
new User: user;

// example 2
new User: users[1000];

// example 3
DeleteUser(User: usr) { ... }

A viable syntax for accessing a struct's inner variables could be the C-like syntax:

// example 1
GetPlayerName(playerid, user.Name, sizeof(user.Name));
user.Score = GetPlayerScore(playerid);
GetPlayerHealth(playerid, user.Health);

// example 2
users[0].Health = 15.0;
new id = users[0].Id;

// example 3
DeleteUser(User: usr)
{
  usr.Id = 0;
  format(usr.Name, sizeof(usr.Name), "");
}

Finally, a very cool feature would be an array-initialization-like syntax for quicker struct initialization:

new User: user = { 0, "SaSiNO97", 420, 97.0 };

It could also work well with array-initialization:

struct Example
{
  A,
  B[3],
  C[4][4]
}

Example: e = 
{
  15,
  { 1, 2, 3 },
  {
    { 15, 3, 24, 27 },
    { 11, 69, 74, 86},
    { 31, 45, 8, 21},
    { 76, 53, 94, 35}
  }
};

Advantages

The usual PAWN way of storing a structure of data is using a 2D array with an enum.

enum User
{
  Id,
  Name[24], // comma
  Score,
  Float:Health
}
new Users[1000][User];

This way:

The struct instead:

new Example: e = { 10 };

printf("%d", Variable); // 15 printf("%d", e.Variable); // 10


- Could be made that it supports 2-D arrays
Southclaws commented 6 years ago

Similar to #234 though this one is more about the data structure side of things.

I like the use of tags to indicate type however I feel this could get confusing - differentiating between whether User: is just a tag on a cell or whether the cell is actually a multi-cell data structure (or pointer?) is problematic. If you're going to introduce new syntax, I believe it's worth also thinking about alternatives to overloading the semantics of tags. I also think there could be useful applications of tagging instances of structs, which would not be possible if the syntax was re-used.

One aspect I think has a lot of merit in this suggestion is the proper support for arrays within structures. Enumerations that contain array-like syntax have confused people for years, me included!

Y-Less commented 6 years ago

Some thoughts:

1) enums can already be used as tags, not just as structures, which is why their symbols are global, but would also mean this is an extremely confusing overloading of syntax.

2) The declaration syntax hides the fact that you are declaring more than one cell, which raises questions about the semantics of passing them as parameters.

3) The alternate new syntax only saves a single character - x.y instead of x[y]. Is the complete destruction of backwards compatibility and whole new coding style worth it for one key press?

4) The main advantage being advocated for seems to be working 2d arrays. Why can't you just fix them in enums?

5) Should they even be fixed? High dimensional arrays are almost always a code smell, indicating that someone hasn't thought enough about their algorithm or memory layout. People seem to love arrays in enums in arrays, and I've no doubt this would be used for arrays in arrays in structs in arrays. I've also no doubt that whatever the code is would be more efficient with a different algorithm or more split up data - i.e. multiple variables. People in SA-MP especially seem to have an obsession with monolithic variables and a belief that using two is worse - the opposite is generally true, this will only encourage that practice.

6) Pawn 4 has a similar syntax for const, why choose to totally branch away instead of using that?

thecodeah commented 6 years ago

I've thought about this before, I don't think it's really necessary as you can replicate this with arrays an enums.

I don't think it's a bad idea though. It could make stuff less confusing and more straight-forward.

AGraber commented 6 years ago

Methodmap syntax from Sourcepawn 1.7:

https://wiki.alliedmods.net/SourcePawn_Transitional_Syntax#Methodmaps

We can take some inspiration from this maybe?

Y-Less commented 6 years ago

That's more calling functions than data storage, which is more what the issue @Southclaws linked to is about than this one.

Sasino97 commented 6 years ago

Nice to hear your thoughts. I forgot to explain my idea about the actual meaning of a variable of a struct type. Basically, I think it should behave like a pointer and not like many cells into one cell, as that would be confusing. I think that the best way to avoid the pawn scripters to mess with the memory is giving the variable an integer value.

// Allocates in the heap the struct info
struct Vehicle 
{
    Id, 
    Name[32]
} 

// Allocates in the heap sizeof(Vehicle) cells, keeps the pointer to this instance in a vector and assigns the index to the pawn cell
new Vehicle: veh; 

// The value of the veh variable is actually the index of that table/vector whose value is the pointer to the memory allocated for this struct instance
printf("%d", veh); // 0
new Vehicle: veh2;
printf("%d", veh2); // 1

// But when the dot operator is used, the compiler will automatically know that what the programmer is asking for is the value pointed by the pointer variable
printf("%d - %s", veh.Id, veh.Name); // 400 - Landstalker 

About the declaring syntax, I agree that it might be difficult to implement without breaking something in its existing usage. It could then allow something like this:

struct Color
{
  R, 
  G, 
  B, 
  A
} 
new Color* color;

About the overall usefulness, I think that this is much more than just changing [ ] with a dot, this is a progress in the life of pawn. Microsoft created C#, then version after version he introduced new features to the language in order to give more power to the programmers, to reduce the boilerplate, and to make the code more straightforward. This particular feature will simplify things for scripters, and enums will be used for their original meaning.

Y-Less commented 6 years ago

Pawn resets the heap after every call to amx_Exec, i.e. every public function. I've tried very very hard to find a way around this, even resorting to exploiting VM bugs present only in the exact version used in SA:MP (and thus not even possible in newer versions) - it didn't work. Basically, heap allocations are ephemeral.

Y-Less commented 6 years ago

Check y_malloc - that's where I was playing with this. If you find a way to overcome this issue PLEASE do tell me - even compiler support isn't really going to add anything, since I was already using assembly. The only other solution I could think of was augmenting evey single public function call with code to save and restore the heap pointer if it wasn't in the right place. I didn't because that is silly overhead for something that might not even be used (determining the requirement for it is, however, one place where compiler support would be useful).

But what you are suggesting essentially amounts to manual memory management, which is the one thing pawn has never had as it complicates code massively.

Y-Less commented 6 years ago

I'm also wondering how your indexing system would even work, assuming the heap allocation/free issues could be solved. If there are two structs of different sizes, and 10 of each are allocated in random orders, do the indexes continue over both? Does each type get their own indexing? How are those indexes mapped to real memory? Why aren't you just using real memory locations in the first place (as with references and arrays)?

Sasino97 commented 6 years ago

My knowledge about how is the AMX abstract machine implemented is extremely limited, so I didn't know about this issue with the heap memory. Some years ago I tried to implement a Pawn wrapper for some WinAPI functions (https://forum.sa-mp.com/showthread.php?t=286543), mainly for fun because I was still young and learning, but I never went into understanding this abstract machine's source code.

About the idea of the indexing system, I think that if each struct type has its own indexing, it is better than exposing the actual pointer, because the average Pawn scripter usually does not know about memory management, and it is more intuitive for him to have the elements as if they were in an array.

I am thinking about a solution for allocation, but at the moment I still don't have one.

Y-Less commented 6 years ago

My point with the indexes was even if they are addresses, that is just an implementation detail - it doesn't matter to any end-users what the value is. Reference parameters (F(&a)) are already memory addresses, and that's never affected anyone since coders can't actually directly view the value. The compiler knows that this is a reference parameter and always dereferences it (without #emit, but that's irrelevant). Just as the compiler would always know that a struct variable was a reference and again dereference it, thus always hiding the value from the user.

Sasino97 commented 6 years ago

Yes that is true and I didn't think well about it. I think that a solution (maybe the only one) for the memory issue would be cooperating with Kalcor in order to update also the AMX code used by SA-MP Server in the next major release (and to include the new compiler as well); this would also allow the compiler team to implement a lot of new features.

AGraber commented 6 years ago

If enum arrays can already store some kind of emulated structures, then I don't think structure memory allocation would be that different. I think the problem was brought up with your table/vector and index proposal, which doesn't seem to be very convenient since real addresses and existing memory allocation methods could be used instead.