keironstoddart / edi-835-parser

A simple EDI 835 file format parser.
MIT License
80 stars 40 forks source link

added supported for uhc ^ delimiter, updated tests #10

Closed christopherpickering closed 2 years ago

christopherpickering commented 2 years ago

Hey, thanks for this :)

I was testing out on UHC and they are using > as a delimiter in the SVC code. I changed the split element function to use regex split instead of checking for the three elements - are you ok w/ this method?

I also updated the tests with a UHC sample file.

Also, there seems to be a panda's issue with summing NAN's. The bcbs demo file has a NAN for one of the payments, and now when summing up all the payments, there is a really long trailing decimal coming from the pandas .sum().

If I do a print in the test, the values come out like this: image

the match works out fine, but pandas .sum() is adding in a long decimal. For now I just added in a .round(2) since you are not intending to test pandas :)

christopherpickering commented 2 years ago

Looks like this change was also part of #8.

keironstoddart commented 2 years ago

Hi Chris,

Thank you for the pull request! +1 for regex, this is a great idea. My only concern is that by splitting on all delimiters we might find ourselves in a situation where we are splitting a segment in a way that was unintended. For example, say UHC, in some cases, uses > as their delimiter, what if they use ^ or : as part of their notation for the contents given element? I think that is why I'd prefer to continue to implement the logic in such a way that only the maximally used delimiter character is used for the split. Does that make sense? I would love to know what you think.

Be that as it may, let me test your code against a repository of EDI 835 files I maintain and see if I run into any issues. Your solution probably still works just fine! Either way, definitely need to add > as an accepted delimiter.

Best, Keiron

christopherpickering commented 2 years ago

Thanks for the feedback! I'll see what can be done on finding the most common as you were doing 👍🏽

christopherpickering commented 2 years ago

@keironstoddart I think the latest commit will work. Instead of adding more logic I did a for loop through the different delimiters. If 2 have the same count, the first in the list is picked by the max().

In any case, I ran on 170k files and have a 10% fallout for other issues I will dig into.