sprhawk / python-on-a-chip

Automatically exported from code.google.com/p/python-on-a-chip
Other
0 stars 0 forks source link

Fix error in GC where object pointer is invalid #108

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Running the PIC code, either in simulation or on a physical chip, produces a 
trap (in simulation, due to an unimplemented RAM memory access; in hardware, 
this isn't cause, but it throws a misaligned address trap as a result).

Test t999 compiles the same code for the desktop, but runs without a problem. 
The code was modified to produce a minimal set for testing and all PIC-specific 
hardware references were eliminated to make it less platform-specific.

Using the simulator trace, the error occurs in heap.c:566. (I modified heap.c 
to split the pobj == NULL test from the second test -- C is allowed to compile 
these in any order, which makes the original code wrong to the best of my 
knowledge). However, the failure occurs on the second test, when  pobj is 
dereferenced via the OBJ_GET_GCVAL macro.

Further backtrace shows that this occurs from a call in heap.c:675, where the 
closure tuple is marked.

I'm not sure how to debug this, since I can't reproduce it on the desktop. I've 
started a branch to investigate a bit more.

Original issue reported on code.google.com by bjones460@gmail.com on 31 Jul 2010 at 4:36

GoogleCodeExporter commented 9 years ago
r532:
- Added modified files as described above.
- Bug now manifests itself on the desktop platform (cygwin, Windows Vista). Not 
sure what I did differently...

Original comment by bjones460@gmail.com on 31 Jul 2010 at 4:49

GoogleCodeExporter commented 9 years ago
Test t999 is not in the branch, so I can't recreate the defect.  Please add 
t999.c and t999.py.

However, I looked at func.c and found a place that could lead to an error (when 
a func's closure tuple is marked).  Attached is a patch for that situation.  
Please give it a try.  The problem is that the func object is created, but the 
closure field in the C struct is not set to C_NULL before the next heap alloc, 
so the closure field is undefined when a potential GC occurs.

Original comment by dwhall...@gmail.com on 31 Jul 2010 at 11:16

Attachments:

GoogleCodeExporter commented 9 years ago
r533:
- Added missing t999.c (thanks, Dean)

Original comment by bjones460@gmail.com on 31 Jul 2010 at 12:53

GoogleCodeExporter commented 9 years ago
Applied patch, but still get a seg fault on the desktop target.

Original comment by bjones460@gmail.com on 31 Jul 2010 at 1:03

GoogleCodeExporter commented 9 years ago
Problem: PmFunc_t fields were not being set to C_NULL when initialized.  A GC 
would occur and attempt to follow invalid pointers.

Fix: Set the C struct's fields to C_NULL immediately after the struct is 
alloc'd and before a GC occurs.

The following patch combines the patch above with a similar fix for 
src/vm/module.c
This patch should be applied at the root of the branch.

After applying this patch, t999.out ran indefinitely because its .py source was 
in a "while True" loop.

PS: I hope you don't mind that I changed the summary to better reflect the 
nature of the defect.

Original comment by dwhall...@gmail.com on 31 Jul 2010 at 7:16

Attachments:

GoogleCodeExporter commented 9 years ago
r535:
- Renamed tests to issue number
- Included patches from Dean,
- Modified heap.c to set the heap value to 0xAA in order to bring out bugs

Original comment by bjones460@gmail.com on 2 Aug 2010 at 5:32

GoogleCodeExporter commented 9 years ago
r536:
- Restored the earlier makefile which now makes the new test correctly

Original comment by bjones460@gmail.com on 2 Aug 2010 at 5:47

GoogleCodeExporter commented 9 years ago
r539:
- Modified heap.c per Dean:

1) please use the sli_memset() API that is part of p14p, only needs the pm.h 
header.  Remove the statement: '#include "string"' at the top of heap.c.  
(Aside: I made sli.c because I have known some MCU compilers to not have a 
"string.h" and other basic libs)

Original comment by bjones460@gmail.com on 2 Aug 2010 at 10:30

GoogleCodeExporter commented 9 years ago
r540:
Fixes per Dean:

1) Convert t108a.py to Unix EOLs.
2) Change t108b.py:16 to "def get(self,):"  (note the new comma after self)

I also noticed that t108a.py tries to "import t108a as pic"

All tests now pass.

Original comment by bjones460@gmail.com on 3 Aug 2010 at 3:46

GoogleCodeExporter commented 9 years ago
r541:
- Forgot to fix comma in def get(self,):

Original comment by bjones460@gmail.com on 3 Aug 2010 at 4:42

GoogleCodeExporter commented 9 years ago
r544:
- Merged in trunk. All tests still pass.

Original comment by bjones460@gmail.com on 3 Aug 2010 at 4:52

GoogleCodeExporter commented 9 years ago

Original comment by bjones460@gmail.com on 4 Aug 2010 at 2:43