refresh-bio / SPLASH

GNU General Public License v3.0
71 stars 9 forks source link

Seg fault error at stage 2 when anchor_len and target_len options are passed #20

Open miramou opened 9 months ago

miramou commented 9 months ago

Hi - thank you for this tool. I'd downloaded the latest release (2.3.0) and wanted to use it to call anchor/target pairs of a different specified length.

There appears to be a segmentation fault at Stage 2 if --anchor_len and --target_len are modified from the default when running satc_merge.

I can reproduce this error by modifying the run_example code as follows. Note that the example runs just fine when passing --dump_sample_anchor_target_count_binary as an option by itself.

#!/bin/bash
./download.py
../splash --bin_path .. --anchor_len 10 --target_len 40 --dump_sample_anchor_target_count_binary input.txt

I've attached splash's output and an example log file (all threads look the same) with the error as well. How do you recommend proceeding?

I also tried running an older version of splash and the error exists there too so I don't believe it's related to the update. stage_2_thread-0038.log splash_output.log

Mira

marekkokot commented 9 months ago

Hi - thank you very much for reporting this! In general, splash supports anchor and target len up to 32 - yet I forgot to add a check; it is added now with 0d974071028b37af3d66b11bc26af29729e538ef

I have not yet created new release and docker, but the above fix will be included in the new release, I just don't know yet when we will make a new release :).

I also found some bug when anchor or target len was 32, it is also fixed, so thanks again :)

Regarding lengths above 32, this is probably doable but would require a lot of work, and as far as I know, currently, there are no plans to support it. Do you think targets longer than 32 would be profitable?

Best Marek

miramou commented 9 months ago

Got it - thanks Marek! It'd be helpful to include in the documentation that there is an upper bound on anchor or target length.

Regarding supporting longer lengths, I think it could be useful sometimes to have longer target lengths. For example, if someone is interested in shorter anchors and would also like the extendor sequence to map uniquely to a genome, then the target must be longer (and vice versa).

For now I'll make sure to avoid anchor or target lengths >= 32 until the new release.