Closed AroneyS closed 1 year ago
Were the CheckM results empty?
No. They were as expected
Oh this looks like it is complaining about the coverm_abundances.tsv
file, was that empty?
Yes, coverm_abundances.tsv
is indeed empty.
Also, coverm.cov
, coverm.filt.cov
, long_abundances.tsv
, long_cov.tsv
and short_cov.tsv
are not empty. But short_abundances.tsv
is empty.
Does coverm.cov
have the short read information?
And can you find any error information for the get_abundances rule in the snakemake log?
Yes, coverm.cov
does have short read information.
I can't see any error information for get_abundances.
[Fri Oct 14 06:31:13 2022]
rule get_abundances:
input: bins/checkm.out
output: data/coverm_abundances.tsv
jobid: 25
reason: Missing output files: data/coverm_abundances.tsv; Input files updated by another job: bins/checkm.out
threads: 8
resources: mem_mb=512000, disk_mb=1000, tmpdir=/data1/tmp
Activating conda environment: ../../../../../../../../../mnt/hpccs01/work/microbiome/conda/66a8b59755f121e40e3a82a9714b3ad5
[Fri Oct 14 06:50:20 2022]
Finished job 25.
25 of 29 steps (86%) done
Select jobs to execute...
Has it happened with any other samples? Nothing is jumping out at me that would cause it to fail here
I've done 18 assemblies (6 each of long-only, long+short, short-only). All 10 that have finished recover so far have this error.
Okay, this isn't reproducible with the test data that Ben generated. Is this only occurring when you have both long and short reads?
Could you also provide the complete list of rules that aviary is attempting to complete?
I haven't tried with only long or only short yet but I can give that a go.
job count min threads max threads
--------------------- ------- ------------- -------------
checkm2 1 8 8
checkm_das_tool 1 8 8
checkm_metabat2 1 8 8
checkm_rosella 1 8 8
checkm_semibin 1 8 8
concoct 1 8 8
das_tool 1 8 8
finalize_stats 1 1 1
get_abundances 1 8 8
get_bam_indices 1 8 8
gtdbtk 1 8 8
maxbin2 1 8 8
metabat2 1 8 8
metabat_sens 1 8 8
metabat_spec 1 8 8
metabat_ssens 1 8 8
metabat_sspec 1 8 8
prepare_binning_files 1 8 8
recover_mags 1 8 8
refine_dastool 1 8 8
refine_metabat2 1 8 8
refine_rosella 1 8 8
refine_semibin 1 8 8
rosella 1 8 8
semibin 1 8 8
singlem_appraise 1 8 8
singlem_pipe_reads 1 1 1
vamb 1 8 8
vamb_jgi_filter 1 8 8
total 29 1 8
I haven't tried with only long or only short yet but I can give that a go.
This doesn't make sense with my understanding of this:
I've done 18 assemblies (6 each of long-only, long+short, short-only). All 10 that have finished recover so far have this error.
Wouldn't some of the ones that have finished have to have been long or short only?
What you could try is deleting all the abundances files and see if you can target finalize_stats
and it only reruns the abundance rules. If it tries to run others you can give the command " --rerun-triggers mtime"
to --snakemake-cmds
to see if that prevents the rest of pipeline running in case the code has updated
Oh I mean that the assemblies were done with short, long, short+long but that the recovery was done with the same samples (for comparison). So recovery was always done with short+long.
Ok thanks.
This happened again with only short-reads. I noticed that the real error is ERROR coverm::bam_generator] Not continuing since when input file pairs have unequal numbers of reads this usually means incorrect / corrupt files were specified
. It looks like the forward/reverse reads given to CoverM are mismatched (from different samples). I double checked and they are specified correctly in the original command.
The order of short_reads_2 in the config doesn't match that of short_reads_1 and neither match the order in the initial command.
Might be due to the set() conversion from commit 4eaefb4b35faec0d77cfa3979f44212227cb7d40
Aviary v0.5.3 error in finalize_stats rule. 27/29 steps done, so I guess this is the last job and the other results are fine to use?
Simplified command (recovery from long-read assembly using 20 short reads and 2 long reads):
Error: