Closed nisargshah95 closed 4 years ago
Are you sure that /mnt/ext4-pmem1 is mounted with DAX? If it's not then pmemobj will be using msyncs to persist the data and this takes a long time. You can set PMEM_IS_PMEM_FORCE=1 environmental variable to skip msyncs like this:
PMEM_IS_PMEM_FORCE=1 ./examples/example-concurrent_hash_map /mnt/ext4-pmem1/myfile
Yes its mounted with DAX. I mount it as
mount -o dax /dev/pmem1 /mnt/ext4-pmem1
I am using hashmap_tx from an older version of PMDK (1.4 or 1.5 I think) and it worked so far without any issues. I will try if any other example code also shows this behavior for PMDK 1.9.
There was a bug a few kernel versions back that would cause a deadlock in the page fault handler logic. Try upgrading your OS to see if that helps. Typically user-space applications, like libpmemobj-cpp, should not be able to hang a system.
It doesn't hang the system, just the pmem partition. Any operations (ls, etc.) stop working on it until reboot the system. I'll try your advice and see if it works.
Still, that means that the file system/kernel is in a softlock - this indicates either a kernel or, in rare scenarios, a hardware problem.
Can you kill the process using kill command? E.g.
kill `pidof ./examples/example-concurrent_hash_map`
If so, you could try running the example under perf then kill the process and see if there are any anomalies.
I tried killing it but it doesn't help. Once I tried to run it under GDB but couldn't get the stacktrace because the program just hangs and killing it didn't do anything. I currently don't have access to the Optane machine, but I can try running it under perf when I get access again.
So I tried running the example with perf but did not see anything different. I waited for about 5 minutes and tried to kill the program. I think the original process was killed but I could still see a process with name "[concurrent_hash_map]" in ps
output. I couldn't kill it with the kill
command.
Even with the program hung, ls /mnt/ext4-pmem1/myfile
was working. As soon as I tried removing the file using rm -rf /mnt/ext4-pmem1/myfile
, the rm
command also hung and now even the ls
command started hanging.
EDIT: I think it works now after upgrading kernel from 5.0.9 to 5.6.13!
Thanks for the update and glad it finally worked out.
QUESTION: concurrent_hash_map example hangs
Details
When I try to run the concurrent_hash_map example at https://github.com/pmem/libpmemobj-cpp/blob/master/examples/concurrent_hash_map/concurrent_hash_map.cpp after creating a pmem pool using
pmempool create obj --layout="concurrent_hash_map" --size 1G --mode 0666 /mnt/ext4-pmem1/myfile
where pmem is mounted on /mnt/ext4-pmem1 in 100% app-direct mode, it simply hangs and I have to reboot the system to get it working again. Even if I run it using a gdb I cannot look at the line it hangs. Any ideas what could cause this?
I'm using PMDK 1.9 with latest master of libpmemobj-cpp on Fedora 30.