Closed nevrome closed 1 year ago
Base: 64.35% // Head: 64.99% // Increases project coverage by +0.64%
:tada:
Coverage data is based on head (
a6dbfbf
) compared to base (3ee14e9
). Patch coverage: 50.87% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Ok - so @stschiff and I decided to abandon the transformation to Text
for now. We had a wrong understand of what a Char
actually is in Haskell. I will prepare another PR with the useful changes introduced here, including the fix for #213.
See PR #220
This PR replaces
String
in .janno files withText
. This has various consequences:T.pack
andT.unpack
.When I went through the code for
list
andsummarize
I did some minor refactoring. Inlist
I changed the error management for the-j
option. Insummarize
I simplified the frequency reporting code. The output has now some ugly quotes, but the code is more easy to read. A fair tradeoff, imho.Finally I also wanted to address the issue that triggered this entire PR: #213. I introduced the new function
cleanInput
inJanno.hs
which trims white spaces before the actual CSV parsing starts. This is an expensive operation, as I decided to decode the raw inputByteString
s toText
, manipulate them and then encode them again, before they are yet again parsed depending on the output type. Removing UTF-8 (!) white spaces inByteString
was not reliable as some of them are sometimes (?) encoded by two characters (e.g.\194\160
for the No-Break Space\160
). ByteString'sdropWhile
anddropWhileEnd
only allow to handle singleChar
s. Data.Text'sstrip
ALSO only trims by singleChar
s withData.Char.isSpace
, but does it somehow (?) more reliably. I did not have to consider\194
as a separate case with this solution. That's why I accepted the performance cost of decoding and encoding. Feel free to suggest something else. But what I implemented now seems to fix #213, which should be the minimum goal.