Closed pdimens closed 2 weeks ago
@pdimens I'd have to think about the chances of confusing some users or masking problems in other parts of people's workflows. I personally like the binaries I use in my workflows to be conservative, and clean things up myself, rather than risk hiding other issues in other parts of the workflow. I'm open to making changes, but I'd need more details. Is it the exit code of non-zero that causes a failure of a workflow step for you, or is it the absence of an output file?
It's technically both that are creating pain points. My current solution is to mimic a try/catch block in the shell:
( falco --quiet --threads 1 -skip-report -skip-summary -data-filename OUTFILE INFILE ) ||
cat <<EOF > {output}
##Falco 1.2.4
>>Basic Statistics fail
#Measure Value
Filename $FILENAME.fq.gz
File type Conventional base calls
Encoding Sanger / Illumina 1.9
Total Sequences 0
Sequences flagged as poor quality 0
Sequence length 0
%GC 0
>>END_MODULE
EOF
I understand what you mean about conservative behavior, which would be maintained with the addition of some kind of --allow-empty
flag that would both exit gracefully and create a basic output file like the one above (that would be properly formatted for MultiQC).
I think that should be fine.
@pdimens I made a branch with this feature. Most of the work was related to fixing potential divide-by-zero when generating summary files if the inputs were empty. The current behavior wasn't intended, as it would just crash. In the new branch add-option-for-empty-input the default behavior checks all inputs to ensure they are non-empty, and if the -allow-empty-input
is specified, it will run anyway, and generate outputs that have counts of 0 inside them. The output files themselves are not empty, but the information in them is. Let me know what you think.
Thanks for looking into this. I cloned the repo but the branch doesn't have the configure
file like the README specifies for manual compilation. It has a configre.ac
file though, however I am unfamiliar with how to use it.
If there's a ./autoconf.sh it will generate the configure. Else I'll add one.
Whether using the new flag or not, the program finishes successfully, which I expect is unintended behavior.
@pdimens I'll just paste my quick test which might be naive. Can you check that you get the same thing?
(base) [17:36][junior3 smithlabcode]$ git clone --recursive https://github.com/smithlabcode/falco.git
Cloning into 'falco'...
remote: Enumerating objects: 1351, done.
remote: Counting objects: 100% (310/310), done.
remote: Compressing objects: 100% (109/109), done.
remote: Total 1351 (delta 217), reused 248 (delta 192), pack-reused 1041 (from 1)
Receiving objects: 100% (1351/1351), 2.36 MiB | 1.50 MiB/s, done.
Resolving deltas: 100% (925/925), done.
(base) [17:36][junior3 smithlabcode]$ cd falco
(base) [17:36][junior3 falco (master)]$ git checkout add-option-for-empty-input
Branch 'add-option-for-empty-input' set up to track remote branch 'add-option-for-empty-input' from 'origin'.
Switched to a new branch 'add-option-for-empty-input'
(base) [17:36][junior3 falco (add-option-for..)]$ ./autogen.sh
configure.ac:22: installing './install-sh'
configure.ac:22: installing './missing'
Makefile.am: installing './depcomp'
parallel-tests: installing './test-driver'
(base) [17:36][junior3 falco (add-option-for..)]$ mkdir build && cd build
(base) [17:36][junior3 build (add-option-for..)]$ ../configure --enable-hts --enable-silent-rules
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a race-free mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether the compiler supports GNU C++... yes
checking whether g++ accepts -g... yes
checking for g++ option to enable C++11 features... none needed
checking whether make supports the include directive... yes (GNU style)
checking dependency style of g++... gcc3
checking whether g++ supports C++17 features with -std=c++17... yes
checking for g++ -std=c++17 option to support OpenMP... -fopenmp
checking for zlibVersion in -lz... yes
checking for hts_version in -lhts... yes
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating config.h
config.status: linking ../test/md5sum.txt to test_build/md5sum.txt
config.status: linking ../test/test_data.tgz to test_build/test_data.tgz
config.status: executing depfiles commands
(base) [17:36][junior3 build (add-option-for..)]$ make -j8
make all-am
make[1]: Entering directory '/home/andrewds/code/smithlabcode/falco/build'
CXX src/falco-falco.o
CXX src/falco-FastqStats.o
CXX src/falco-HtmlMaker.o
CXX src/falco-Module.o
CXX src/falco-StreamReader.o
CXX src/falco-FalcoConfig.o
CXX src/falco-OptionParser.o
CXX src/falco-smithlab_utils.o
CXXLD falco
make[1]: Leaving directory '/home/andrewds/code/smithlabcode/falco/build'
(base) [17:36][junior3 build (add-option-for..)]$ make check
make check-TESTS
make[1]: Entering directory '/home/andrewds/code/smithlabcode/falco/build'
make[2]: Entering directory '/home/andrewds/code/smithlabcode/falco/build'
PASS: test/falco.test
============================================================================
Testsuite summary for falco 1.2.4
============================================================================
# TOTAL: 1
# PASS: 1
# SKIP: 0
# XFAIL: 0
# FAIL: 0
# XPASS: 0
# ERROR: 0
============================================================================
make[2]: Leaving directory '/home/andrewds/code/smithlabcode/falco/build'
make[1]: Leaving directory '/home/andrewds/code/smithlabcode/falco/build'
(base) [17:36][junior3 build (add-option-for..)]$ touch empty_file.fq
(base) [17:36][junior3 build (add-option-for..)]$ ./falco empty_file.fq
Input file is empty: empty_file.fq
(base) [17:36][junior3 build (add-option-for..)]$ ./falco -allow-empty-input empty_file.fq
[limits] using file /home/andrewds/code/smithlabcode/falco/Configuration/limits.txt
[adapters] using file /home/andrewds/code/smithlabcode/falco/Configuration/adapter_list.txt
[contaminants] using file /home/andrewds/code/smithlabcode/falco/Configuration/contaminant_list.txt
[Thu Oct 24 17:37:01 2024] Started reading file empty_file.fq
[Thu Oct 24 17:37:01 2024] reading file as uncompressed FASTQ format
[running falco|===================================================|9223372036854775808%]
[Thu Oct 24 17:37:01 2024] Finished reading file
[Thu Oct 24 17:37:01 2024] Writing summary to ./summary.txt
[Thu Oct 24 17:37:01 2024] Writing text report to ./fastqc_data.txt
[Thu Oct 24 17:37:01 2024] Writing HTML report to ./fastqc_report.html
Elapsed time for file empty_file.fq: 0s
(base) [17:37][junior3 build (add-option-for..)]$
Oh how strange, I ran it again and it worked as described.
@andrewdavidsmith would it be possible to push this to a new release anytime soon?
Sure. I can probably do it in the next couple days.
Wonderful, thank you!
As part of a larger workflow, if
falco
is run on an empty fastq(.gz) file, it fails with an error code. Is it possible, either via a flag or by default behavior, to instead have falco output a report file that is empty? Example: