katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
123 stars 65 forks source link

Cannot parse strain name when '.' in file path #99

Open Takadonet opened 6 years ago

Takadonet commented 6 years ago

So the following line keeps failing when one or more '.' are found on the path to pileup file.

Example failing file path is /Storage/galaxy-17.05/working/dapE_11.out__Strain1.dataset_357596.pileup

mbargull commented 6 years ago

Hi @Takadonet, I took a closer look at the patch and I'd asses (without looking at the entirety of the code) it really does only fix shortcomings without other behavioral changes. Hence I'd be fine with including it in either version for https://github.com/bioconda/bioconda-recipes/pull/6957. Though, I can't really wrap my head around why the program works correctly anyway -- maybe you can help me understand: As I see it, the pileup_file string is set to

args.output + '__' + sample_name + '.' + db_name + '.pileup'

at https://github.com/katholt/srst2/blob/v0.2.0/scripts/srst2.py#L1381-L1382 and gets split with

pileup_file.split(".")[1].split("__")[1]

where the patch from #100 applies, at https://github.com/katholt/srst2/blob/v0.2.0/scripts/srst2.py#L453. Now if the output parameter doesn't include a ., the split(".")[1] would do the wrong thing, I suppose. In you example, dapE_11.out__Strain1.dataset_357596.pileup, you have the .out, so everything is fine; but wouldn't it fail if the output didn't include that dot? I'm confused..

Takadonet commented 6 years ago

The output always contain a . character hence why it always works. It will not parse correctly when a file path has a . since it will start the split on that instead of the first . in the file name.

mbargull commented 6 years ago

Ah, ok, I was only looking at the call at https://github.com/katholt/srst2/blob/v0.2.0/scripts/srst2.py#L1422 which can't work (I think). But for the other calls create_allele_pileup is used beforehand which adds the ., right.