Closed christopherpickering closed 2 years ago
Looks like this change was also part of #8.
Hi Chris,
Thank you for the pull request! +1 for regex, this is a great idea. My only concern is that by splitting on all delimiters we might find ourselves in a situation where we are splitting a segment in a way that was unintended. For example, say UHC, in some cases, uses >
as their delimiter, what if they use ^
or :
as part of their notation for the contents given element? I think that is why I'd prefer to continue to implement the logic in such a way that only the maximally used delimiter character is used for the split. Does that make sense? I would love to know what you think.
Be that as it may, let me test your code against a repository of EDI 835 files I maintain and see if I run into any issues. Your solution probably still works just fine! Either way, definitely need to add >
as an accepted delimiter.
Best, Keiron
Thanks for the feedback! I'll see what can be done on finding the most common as you were doing 👍🏽
@keironstoddart I think the latest commit will work. Instead of adding more logic I did a for loop through the different delimiters. If 2 have the same count, the first in the list is picked by the max()
.
In any case, I ran on 170k files and have a 10% fallout for other issues I will dig into.
Hey, thanks for this :)
I was testing out on UHC and they are using
>
as a delimiter in the SVC code. I changed the split element function to use regex split instead of checking for the three elements - are you ok w/ this method?I also updated the tests with a UHC sample file.
Also, there seems to be a panda's issue with summing NAN's. The bcbs demo file has a NAN for one of the payments, and now when summing up all the payments, there is a really long trailing decimal coming from the pandas .sum().
If I do a print in the test, the values come out like this:
the match works out fine, but pandas .sum() is adding in a long decimal. For now I just added in a
.round(2)
since you are not intending to test pandas :)