refresh-bio / SPLASH

57 stars 6 forks source link

Error in stage 2: Cannot open file [tmp-dir/samplename.0.bin] to get file size #7

Closed nilcam closed 11 months ago

nilcam commented 11 months ago

Hello, I'm running splash2 with the following command /home/nilc/storage/splash2/tools/splash-2.1.4/splash input.txt --dump_sample_anchor_target_count_txt --n_threads_stage_1 10 --n_threads_stage_1_internal 8 --n_threads_stage_2 100 --n_bins 1 --kmc_max_mem_GB 64 2> std.err 1> std.out

It runs stage 1 correctly but consistently returns an error in stage 2. This is the main error:

------------------------------------------------
Starting stage 2
Current time: 2023-09-18 00:23:43
Error running command: /usr/bin/time -v /home/nilc/storage/splash2/tools/splash-2.1.4/satc_merge                         --max_pval_opt_for_Cjs 0.1     --anchor_count_threshold 50     --anchor_samples_threshold 1     --anchor_unique_targets_threshold 1     --n_most_freq_targets 2     --n_most_freq_targets_for_stats 0     --opt_num_inits 10     --opt_num_iters 50     --num_rand_cf 50     --num_splits 1     --opt_train_fraction 0.25     --dump_sample_anchor_target_count_txt result_dumps/bin0.satc.dump          --sample_names sample_name_to_id.mapping.txt     --format satc          splash-tmp-3fbed511bafc419bba6c718365a22dde/result.bin0.stats.tsv splash-tmp-3fbed511bafc419bba6c718365a22dde/files.bin0.lst
For details check logs/stage_2_thread-0001.log

And this is the end of the log file mentioned in the previous error message. As you can see, the issue is in satc_merge:

                splash-tmp-3fbed511bafc419bba6c718365a22dde/sample15.0.bin
                splash-tmp-3fbed511bafc419bba6c718365a22dde/sample16.0.bin
                splash-tmp-3fbed511bafc419bba6c718365a22dde/sample17.0.bin
                splash-tmp-3fbed511bafc419bba6c718365a22dde/sample18.0.bin
Warning: no anchors in splash-tmp-3fbed511bafc419bba6c718365a22dde/sample19.0.bin
Warning: no anchors in splash-tmp-3fbed511bafc419bba6c718365a22dde/sample20.0.bin
Error: cannot open file splash-tmp-3fbed511bafc419bba6c718365a22dde/sample21.0.bin to get file size
Command exited with non-zero status 1
        Command being timed: "/home/nilc/storage/splash2/tools/splash-2.1.4/satc_merge --max_pval_opt_for_Cjs 0.1 --anchor_count_threshold 50 --anchor_samples_threshold 1 --anchor_unique_targets_threshold 1 --n_most_freq_targets 2 --n_most_freq_targets_for_stats 0 --opt_num_inits 10 --opt_num_iters 50 --num_rand_cf 50 --num_splits 1 --opt_train_fraction 0.25 --dump_sample_anchor_target_count_txt result_dumps/bin0.satc.dump --sample_names sample_name_to_id.mapping.txt --format satc splash-tmp-3fbed511bafc419bba6c718365a22dde/result.bin0.stats.tsv splash-tmp-3fbed511bafc419bba6c718365a22dde/files.bin0.lst"

I successfully ran the example. Additionally, I also successfully ran, with the same parameters, a subset of the dataset that returns the error. This makes me guess there's a resource limitation, as I'm running quite a large dataset and running only the subset worked without errors.

  1. Do you know what's the issue and how to solve it?
  2. Since my dataset is large and stage 1 already ran, I wonder if there's an easy way to continue a run from stage 2? I guess I could just comment out the stage 1 commands in splash.py and manually set the tmp-dir. However I wonder if that would work without problems or if there's an easier way to do it.

Thanks in advance!

nilcam commented 11 months ago

Hi, I found the issue: I have too many samples, so I was reaching the limit of opened files. Changed the limit with ulimit -n has solved the issue.