nextflow run hello: ERROR ~ a fault occurred in an unsafe memory access operation

fbnrst commented 2 months ago

Bug report

Expected behavior and actual behavior

Expected behavior: nextflow run hello should work.

Actual behavior: About a week ago, I installed nextflow using mamba (i.e. conda) on our clsuter, and it worked just fine. I had also tested the hello world example. Now, I wanted to start a new pipeline and got an ERROR, and I also get the same error when I try to run the hello world example, see below under Program output .

Steps to reproduce the problem

mamba create -n nextflow23.10 -c conda-forge -c bioconda -c defaults nextflow=23.10 openjdk=20 -y
mamba activate nextflow23.10
nextflow run hello

This temporarily solved the problem. But then the error came back. And I even observed that it once worked again, but then the error came back.

Program output

$ nextflow run hello
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nextflow-io/hello` [gigantic_wozniak] DSL2 - revision: 7588c46ffe [master]
ERROR ~ a fault occurred in an unsafe memory access operation

 -- Check '.nextflow.log' file for details

Environment

Nextflow version: 23.10.1
Java version: openjdk 20.0.2-internal 2023-07-18
Operating system: CentOS 7.4.1708
Bash version: GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)

Additional context

It is probably hard to tell what is going on with the limited information I can provide at the moment. If anyone has ideas what I should try or which infos I should provide, I would be very grateful

nextflow.log

pditommaso commented 2 months ago

This is commonly related to the lack of temporary disk storage. There at least of similar issues https://github.com/nextflow-io/nextflow/issues?q=is%3Aissue+unsafe+memory+is%3Aclosed

fbnrst commented 2 months ago

I do not see an issue with disk storage, everywhere I look, there seems to be plenty of space. However, I did realise the following: It seems to work when I run the pipeline in my home directory. But if I run it from a directory on our lustre filesystem nextflow seems to crash:

$ cd ~/temp/
$ nextflow run hello
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nextflow-io/hello` [scruffy_chandrasekhar] DSL2 - revision: 7588c46ffe [master]
executor >  local (4)
[e0/3d25bb] process > sayHello (4) [100%] 4 of 4 ✔
Ciao world!

Bonjour world!

Hello world!

Hola world!

$ cd /path/on/lustre
$ nextflow run hello
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nextflow-io/hello` [evil_baekeland] DSL2 - revision: 7588c46ffe [master]
ERROR ~ a fault occurred in an unsafe memory access operation

 -- Check '.nextflow.log' file for details

Not yet sure how this might help, but at least I can now reproduce it more systematically.

fbnrst commented 2 months ago

Just wanted to add that running things in my home directory is not an option, because there I do not have enough space. I also talked to our admin and he cannot see why disk space should be an issue on our lustre file system. Any other ideas how I can track down this issue? I am kind of stuck and at the moment cannot work with nextflow at all.

bentsherman commented 2 months ago

The error happens specifically with the LevelDB cache:

java.lang.InternalError: a fault occurred in an unsafe memory access operation
    at java.base/jdk.internal.misc.Unsafe.copyMemory0(Native Method)
    at java.base/jdk.internal.misc.Unsafe.copyMemory(Unsafe.java:806)
    at java.base/jdk.internal.misc.ScopedMemoryAccess.copyMemoryInternal(ScopedMemoryAccess.java:147)
    at java.base/jdk.internal.misc.ScopedMemoryAccess.copyMemory(ScopedMemoryAccess.java:129)
    at java.base/java.nio.ByteBuffer.putArray(ByteBuffer.java:1333)
    at java.base/java.nio.ByteBuffer.put(ByteBuffer.java:1192)
    at org.iq80.leveldb.util.Slice.getBytes(Slice.java:246)
    at org.iq80.leveldb.impl.MMapLogWriter.writeChunk(MMapLogWriter.java:208)
    at org.iq80.leveldb.impl.MMapLogWriter.addRecord(MMapLogWriter.java:186)
    at org.iq80.leveldb.impl.VersionSet.writeSnapshot(VersionSet.java:329)
    at org.iq80.leveldb.impl.VersionSet.logAndApply(VersionSet.java:284)
    at org.iq80.leveldb.impl.DbImpl.<init>(DbImpl.java:223)
    at org.iq80.leveldb.impl.Iq80DBFactory.open(Iq80DBFactory.java:83)
    at nextflow.cache.DefaultCacheStore.openDb(DefaultCacheStore.groovy:78)
    at nextflow.cache.DefaultCacheStore.open(DefaultCacheStore.groovy:106)
    at nextflow.cache.DefaultCacheStore.open(DefaultCacheStore.groovy)
    at nextflow.cache.CacheDB.open(CacheDB.groovy:59)
    at nextflow.Session.init(Session.groovy:420)
    at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:128)
    at nextflow.cli.CmdRun.run(CmdRun.groovy:372)
    at nextflow.cli.Launcher.run(Launcher.groovy:500)
    at nextflow.cli.Launcher.main(Launcher.groovy:672)

I guess leveldb is memory-mapping the db file, maybe this operation is not supported by Lustre, or your particular implementation / configuration of Lustre. I would ask your sys admin about this

nextflow-io / nextflow