mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
375 stars 69 forks source link

SharedQueue correctness bugs #40

Closed qinsoon closed 3 years ago

qinsoon commented 4 years ago

In GitLab by @caizixian on Jul 5, 2019, 19:10

Reported by @wenyuzhao

A queue should be initialized as locally done https://gitlab.anu.edu.au/mmtk/mmtk/blob/9224c23f73b4713fd972395d7a7d0e1f4f3a1987/src/util/queue/shared_queue.rs#L66

Should continue the outer loop https://gitlab.anu.edu.au/mmtk/mmtk/blob/9224c23f73b4713fd972395d7a7d0e1f4f3a1987/src/util/queue/shared_queue.rs#L47

qinsoon commented 4 years ago

In GitLab by @caizixian on Jul 5, 2019, 19:11

changed title from SharedQueue to SharedQueue{+ correctness bugs+}

qinsoon commented 4 years ago

In GitLab by @qinsoon on Feb 13, 2020, 18:50

mentioned in merge request !30

qinsoon commented 4 years ago

In GitLab by @qinsoon on Feb 13, 2020, 19:00

https://gitlab.anu.edu.au/mmtk/mmtk/merge_requests/30 Should have fixed the continue issue.

However, I think it is correct not correct that a queue is initialized as 'locally done' (starved). Assume the case where we have two local queues, when the first queue runs out of its local buffer ('locally done'), it should spin and wait as the other local queue may push more work. If the other queue is initialized as 'locally done', the first queue will just return None immediately rather than spinning and waiting.

I would suggest that a queue is initialized as 'locally not done', and each queue should call spin() and should not quit until they get None on the spin() call.

qinsoon commented 4 years ago

In GitLab by @caizixian on Feb 13, 2020, 19:03

However, I think it is correct that a queue is initialized as 'locally done' (starved)

You mean "it is not correct"?

qinsoon commented 4 years ago

In GitLab by @caizixian on Feb 13, 2020, 19:05

@wenyuzhao do you remember why you think that a queue should be initialized as locally done? Can you give an example

qinsoon commented 4 years ago

In GitLab by @qinsoon on Feb 13, 2020, 19:06

Yeah. Fixed.

qinsoon commented 4 years ago

In GitLab by @caizixian on Feb 13, 2020, 19:07

@qinsoon I think the example above that you gave makes sense

qinsoon commented 4 years ago

In GitLab by @caizixian on Feb 13, 2020, 19:08

However, I can't remember why Wenyu thought it's more appropriate to do it otherwise.

In any case, we should put your example into the code as comment, so that we can remind ourselves all possible cases, and why we do things in certain ways

qinsoon commented 4 years ago

In GitLab by @qinsoon on Feb 13, 2020, 19:15

I think the test spin_return() covered the case in https://gitlab.anu.edu.au/mmtk/mmtk/merge_requests/30 (which needs your review). If there is any missing test for cases you can think up, please let me know.

qinsoon commented 4 years ago

In GitLab by @caizixian on Feb 13, 2020, 19:21

Thanks. I will need to think through this carefully, as concurrent data structure is tricky

qinsoon commented 4 years ago

In GitLab by @wenyuzhao on Feb 13, 2020, 19:50

For the locally done initialization:

There can be a case that some local queue are producer-only (e.g. the mutator local remset buffer) and some local queue are consumers (e.g. collector local queues that consumes the remset). In this case consumer-local-queue may keeps waiting for producer queue, as the producer queue is initialized as locally not done and never dequeue/spin to mark itself as locally done.

However, this is just a hypothesis. When I was building the generational-copy GC that uses such remsets, I never encountered these problems.

qinsoon commented 4 years ago

In GitLab by @caizixian on Feb 13, 2020, 19:57

Ha interesting. Well, the flush mechanism can potentially be used to set the locally done state

qinsoon commented 4 years ago

In GitLab by @qinsoon on Feb 13, 2020, 20:05

It seems we just need a function on SharedQueue so the local queue can tell the shared queue that it is done without attempting to pull any new data.

qinsoon commented 4 years ago

In GitLab by @wenyuzhao on Feb 13, 2020, 20:11

For this case maybe we only needs to mark the local queue as done when they flush data to shared queue. I am not sure if adding functions like mark_as_done is a good choice... Users can easily use this function incorrectly.

qinsoon commented 4 years ago

In GitLab by @caizixian on Feb 13, 2020, 20:14

But when new data is generated by a producer, it needs to change the state again.

qinsoon commented 4 years ago

In GitLab by @qinsoon on Feb 13, 2020, 20:16

Does local queue get dropped when the work is done? Otherwise, we should have a way to reset queues' states for the next GC cycle.

qinsoon commented 4 years ago

In GitLab by @wenyuzhao on Feb 13, 2020, 20:17

They are not dropped. They are fields in the collector/mutator sturcts. So they are kind of global variables that never drop...

qinsoon commented 4 years ago

In GitLab by @qinsoon on Feb 13, 2020, 20:22

But we are not resetting its states before each GC (which means next time when we use the queue, all local queues are considered as 'locally done' already). That does not sound correct.

qinsoon commented 4 years ago

https://github.com/mmtk/mmtk-core/pull/3 revealed this issue. We should fix this: https://github.com/mmtk/mmtk-core/blob/521b9e40e47c0570e3d11ed0f6ed522b7ebd7cc0/src/plan/trace.rs#L22