And this is the end of the log file mentioned in the previous error message. As you can see, the issue is in satc_merge:
splash-tmp-3fbed511bafc419bba6c718365a22dde/sample15.0.bin
splash-tmp-3fbed511bafc419bba6c718365a22dde/sample16.0.bin
splash-tmp-3fbed511bafc419bba6c718365a22dde/sample17.0.bin
splash-tmp-3fbed511bafc419bba6c718365a22dde/sample18.0.bin
Warning: no anchors in splash-tmp-3fbed511bafc419bba6c718365a22dde/sample19.0.bin
Warning: no anchors in splash-tmp-3fbed511bafc419bba6c718365a22dde/sample20.0.bin
Error: cannot open file splash-tmp-3fbed511bafc419bba6c718365a22dde/sample21.0.bin to get file size
Command exited with non-zero status 1
Command being timed: "/home/nilc/storage/splash2/tools/splash-2.1.4/satc_merge --max_pval_opt_for_Cjs 0.1 --anchor_count_threshold 50 --anchor_samples_threshold 1 --anchor_unique_targets_threshold 1 --n_most_freq_targets 2 --n_most_freq_targets_for_stats 0 --opt_num_inits 10 --opt_num_iters 50 --num_rand_cf 50 --num_splits 1 --opt_train_fraction 0.25 --dump_sample_anchor_target_count_txt result_dumps/bin0.satc.dump --sample_names sample_name_to_id.mapping.txt --format satc splash-tmp-3fbed511bafc419bba6c718365a22dde/result.bin0.stats.tsv splash-tmp-3fbed511bafc419bba6c718365a22dde/files.bin0.lst"
I successfully ran the example. Additionally, I also successfully ran, with the same parameters, a subset of the dataset that returns the error. This makes me guess there's a resource limitation, as I'm running quite a large dataset and running only the subset worked without errors.
Do you know what's the issue and how to solve it?
Since my dataset is large and stage 1 already ran, I wonder if there's an easy way to continue a run from stage 2? I guess I could just comment out the stage 1 commands in splash.py and manually set the tmp-dir. However I wonder if that would work without problems or if there's an easier way to do it.
Hello, I'm running splash2 with the following command
/home/nilc/storage/splash2/tools/splash-2.1.4/splash input.txt --dump_sample_anchor_target_count_txt --n_threads_stage_1 10 --n_threads_stage_1_internal 8 --n_threads_stage_2 100 --n_bins 1 --kmc_max_mem_GB 64 2> std.err 1> std.out
It runs stage 1 correctly but consistently returns an error in stage 2. This is the main error:
And this is the end of the log file mentioned in the previous error message. As you can see, the issue is in
satc_merge
:I successfully ran the example. Additionally, I also successfully ran, with the same parameters, a subset of the dataset that returns the error. This makes me guess there's a resource limitation, as I'm running quite a large dataset and running only the subset worked without errors.
splash.py
and manually set thetmp-dir
. However I wonder if that would work without problems or if there's an easier way to do it.Thanks in advance!