katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
123 stars 65 forks source link

Error in output when a gene contains a parentheses #142

Open nvlachos opened 6 months ago

nvlachos commented 6 months ago

I am reaching out in response to the recent communications with Alison Laufer Halpin ( and Jill Hagey) about trying to use srst2 in our pipeline. We have run into a pair of errors that we have mitigated locally, but would like to pass along to you to include, if desired. These could be just fixes for our specific environment but hopefully they may help others as well. Both occur in srst2.py in the 0.2.0 branch

1) Output error if a '(' is in the description of the gene in the database header (ex:>980aph(3')aph(3')-Id_NG_047445.106702aminoglycoside__NCBI [NCBI]aph(3')-Id:1:NG_047445.1:aminoglycoside_O-phosphotransferase_APH(3')-Id|WP_010891085.1|499193545:aminoglycoside;NG_047445.1;aminoglycoside;NCBI) Source of the error is https://github.com/katholt/srst2/blob/73f885f55c748644412ccbaacecf12a771d0cae9/scripts/srst2.py#L1502 Our fix was to change the line to (note: there are backslashes where the big X's are since the markup took them away) command = "grep X""+allele+"X" "+fasta header_string = os.popen(command)

2) Rounding Error resulting in the script to completely failing with RuntimeError: floating point number truncated to an integer However, this has been resolved since release 0.2.0, but not part of any official release/tag (https://github.com/katholt/srst2/issues/69). This prevents us from properly documenting tool versions for any validation purposes.

We also would like to bump issue#141 to see if anyone was able to look into it yet.

Thank you so much for your help! Nick