mrvollger / SDA

Segmental Duplication Assembler (SDA).
MIT License
44 stars 6 forks source link

Custom repeat libraries for RepeatMasker #17

Closed SergejN closed 1 year ago

SergejN commented 3 years ago

Dear @mrvollger,

this is the last one for today, I promise :) Although RepeatMasker has a large collection of repeat libraries, sometimes it is necessary to use a custom repeat library. This can be because the organism is novel and there are no libraries yet, or because you deliberately want to focus on a subset of repeats (say LTRs). RepeatMasker allows you to specify the library using the --lib argument. Could you also add that to SDA? Currently, I solved it as follows

shell("""   
    if [ -e {RM_DB} ]
    then 
        RepeatMasker \
            -lib {RM_DB} \
            -e ncbi \
            -dir {rmdir} \
            -pa {threads} \
            {input.split}
    else
        RepeatMasker \
            -species {RM_DB} \
            -e ncbi \
            -dir {rmdir} \
            -pa {threads} \
            {input.split}
    fi
    """)

at https://github.com/mrvollger/SDA/blob/1fbe948f3d8cde6ae6b8c49b33f4220053755718/denovo_SDA.smk#L284

Although, I must admit, I never worked with Snakemake before, therefore, it may not be the best solution. In a regular shell script, I'd rather do

RM_LIB=
if [ -e ${RM_DB} ]
then
   RM_LIB="-lib ${RM_DB}"
else
   RM_LIB="-species ${RM_DB}"
fi
RepeatMasker ${RM_LIB} ...

But I didn't know how snakemake treats those variables and it seemed to be a quick way of fixing the issue without going too much into detail.

Thanks! Sergej

mrvollger commented 3 years ago

This looks like another good fix will add.