ngless-toolkit / ngless

NGLess: NGS with less work
https://ngless.embl.de
Other
142 stars 24 forks source link

--subsample option not working with writing out a FASTQ file. #85

Closed aniag closed 5 years ago

aniag commented 5 years ago

It seems that in case of subsampling there is automatically a suffix .subsampled added to the output filename. While in general it is a good idea to make it obvious that it is not a full result, it backfires when ngless tries to format the filename (inserting pair.1, pair.2 and singles):

_formatFQOname base insert
    | "{index}" `isInfixOf` base = return $ replace "{index}" insert base
    | endswith ".fq" base = return $ removeEnd base ".fq" ++ "." ++ insert ++ ".fq"
    | endswith ".fq.gz" base = return $ removeEnd base ".fq.gz" ++ "." ++ insert ++ ".fq.gz"
    | endswith ".fq.bz2" base = return $ removeEnd base ".fq.bz2" ++ "." ++ insert ++ ".fq.bz2"
| otherwise = throwScriptError ("Cannot handle filename " ++ base ++ " (expected extension .fq/.fq.gz/.fq.bz2).")

Which then results in the following error message:

Cannot handle filename sample_name.preprocessed.fq.gz.subsampled (expected extension .fq/.fq.gz/.fq.bz2).

And a small example script which results in the above error:

ngless "0.7"
import "mocat" version "0.0"

sample = ARGV[2]
input = load_mocat_sample(ARGV[1] + '/' + sample)                                                                                                                               

input = preprocess(input, keep_singles=True) using |read|:                                                                                                                      
    read = substrim(read, min_quality=25)
    if len(read) < 45:                                                                                                                                                          
        discard
write(input, ofile=sample+'.preprocessed.fq.gz')