nanoporetech / scrappie

Scrappie is a technology demonstrator for the Oxford Nanopore Research Algorithms group
Mozilla Public License 2.0
92 stars 27 forks source link

Processing of a whole folder #18

Closed phpeters closed 6 years ago

phpeters commented 6 years ago

Hej,

scrappie works fine for me if I run it on a single fast5-file provided in the program's reads-folder, as well on a single real data file. But when I'm trying to run it on a list of files or a folder with fast5 files, it produces a stack of error messages (example A), or processes just one file (example B). So I try to loop through a list of the files, but then I sometimes get a sequence, or I get a different error message (example C).

I'm a bit puzzled since it works on one file. Any good idea why I can't process more?

Thanks a lot and best regards! Philipp

example A

scrappie events folder/to/fast5Files/ HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0: thread 0 #000: ../../src/H5F.c line 1509 in H5Fopen(): unable to open file thread 0HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0: : major: File accessability minor: Unable to open file

example B

scrappie events file1.fast5 file2.fast5 HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:

000: ../../src/H5F.c line 1509 in H5Fopen(): unable to open file

major: File accessability
minor: Unable to open file

001: ../../src/H5F.c line 1175 in H5F_open(): unable to retrieve VFL class

major: File accessability
minor: Can't get value

002: ../../src/H5I.c line 1592 in H5I_inc_ref(): invalid type number

major: Invalid arguments to routine
minor: Out of range

scrappie: Failed to open

readFromFile2 AAATATATA....

example C

while read line; do scrappie events $line; done < listOfFiles scrappie: Failed to create dataset for event table /software/scrappie/src/fast5_interface.c:201.

tmassingham-ont commented 6 years ago

Hello. A lot of problems like these seem to be because many versions of the HDF5 library that are distributed do not support access from multiple threads. Do you still get problems running Scrappie using parallel, as described by example in the documentation?

phpeters commented 6 years ago

Great! The parallel single-thread mode works perfectly! Thanks a lot! Philipp

phpeters commented 6 years ago

Hej Tim, One more question arose: Is it still possible to somehow use the "--dump" flag in the parallel mode? I tried and got a mixture of the error messages from above. Thanks and best regards! Philipp

tmassingham-ont commented 6 years ago

Yes, after a little fiddling. Since your HDF5 library doesn't support multithreaded writing, you need to make sure that each parallel job writes to a separate file. I think the following achieves this:

find path/to/reads -name \*.fast5 | parallel -P ${NCPU}  build/scrappie events --threads 1 --dump out{%}.hdf5 {}

There will be one error message per thread, where scrappie first attempts to open the dump file and fails because it doesn't yet exist. This message is harmless.