phac-nml / biohansel

Rapidly subtype microbial genomes using single-nucleotide variant (SNV) subtyping schemes
Apache License 2.0
25 stars 7 forks source link

Allow levels past 9 #85

Closed DarianHole closed 5 years ago

DarianHole commented 5 years ago

Hey @peterk87, Genevieve asked me to make this change and the data I created to test this change works without error. I was just curious if this limitation was set for an issue I am unaware of that I can look at to fully correct.

peterk87 commented 5 years ago

Yeah that wasn't a very good check and it should probably be rethought. We should allow any integer values for subtypes. The check should be trying to do make sure that there are only digits maybe separated by periods so a regex like ^\d[^\d\.]*\d*$ might be enough with additional checks to make sure there are no empty strings when splitting on periods.

DarianHole commented 5 years ago

Sounds good. I'll look at making that change and then pushing it to this branch

DarianHole commented 5 years ago

The regex check you sent seems to work well and correctly on my test datasets.

I checked the period splitting (Like this correct: "2. ") and it gave me an error as expected.

peterk87 commented 5 years ago

Does it fail on "1." and "1..1"? Also can you add some unittests?

DarianHole commented 5 years ago

I just double checked and it does fail on "1." and "1..1" so I will add some unittests and then push it again.

DarianHole commented 5 years ago

I believe I've put the tests in correctly. Please correct me if I have done it wrong.

Regex was not actually checking for errors in the way I wrote the code before and another module was giving the errors. Now regex checks and correctly displays the ValueError along with being more specific