purescript / purescript-strings

String utility functions, Char type, regular expressions.
BSD 3-Clause "New" or "Revised" License
54 stars 71 forks source link

splitAt without Maybe? #78

Closed MonoidMusician closed 6 years ago

MonoidMusician commented 7 years ago

In my project I've had to define a splitAtTuple but even after #69 was fixed I still use it because I don't like the Maybe in the return type: just return an empty String on one side!

I would prefer it if basically any string index was considered "valid" insofaras it would return a record instead of a Maybe, with values ranging between { before: "", after: s } and { before: s, after: "" }.

Based on JavaScript String.prototype.substring behavior, it looks like this would amount to just removing the conditional (since substrings with negative indices and indices beyond the end of the string just return the whole string or empty as appropriate).

Is this something you are willing to change again, or would it be best to introduce like a splitAt' in this library with my suggested version?

davidchambers commented 7 years ago

I suggest a separate splitAt' function.

hdgarrood commented 7 years ago

So the way I see it there are 3 options:

  1. Leave the currently exported splitAt as is and export a new splitAt' which just returns a { before :: String, after :: String } (no Maybe)
  2. Change the currently exported splitAt to not use Maybe
  3. Don't change anything, let people define their own version if that's what they want.

Option 1 is actually my least favourite. The way I see it, the most important thing is that the API exported by this package is coherent and consistent. I don't think people wanting/needing to define helper functions locally is always a bad thing; in fact I think it often isn't.

Given that take, drop, takeWhile and dropWhile return just String rather than Maybe String, and splitAt seems the closest to these, I think I would prefer option 2, as I think it would make splitAt more consistent with these functions. But option 3 is fine by me too.

MonoidMusician commented 7 years ago

Okay ... that makes sense to me. I lean towards option 2 as well.

Is there a compelling use case for splitAt with Maybe?

hdgarrood commented 7 years ago

I'm not able to think of one off the top of my head but perhaps someone else will be able to.

davidchambers commented 7 years ago

Is there a compelling use case for splitAt with Maybe?

The current return type preserves more information about the input, avoiding code like this:

if idx is a valid index into s:
  split s at index idx and do something with the result
else:
  do something else

take, drop, takeWhile and dropWhile return just String rather than Maybe String

Since these functions are not designed to preserve information, splitAt is inconsistent.

MonoidMusician commented 7 years ago

I guess I'm still not sure what the usefulness of your above hypothetical example is ... But does it impede the working of the function to return a record regardless of indices passed?

Could the interest of be equally or better served with a separate function that checks the indices? I mean, currently most code will still have to unwrap from Maybe – it's just that the API made the decision about when it returns Just and Nothing.

Also, I would like to note that bounds checking for an index does depend on how the index is being used: splitAt (length s) s should be perfectly valid (i should be valid from 0 to length, inclusive), although it returns Nothing right now... Whereas trying to find a character at a position is obviously only valid from 0 to length-1, inclusive.

davidchambers commented 7 years ago

Could the interest of be equally or better served with a separate function that checks the indices?

Daniel Spiewak demonstrated problems with this approach in May Your Data Ever Be Coherent.

MonoidMusician commented 7 years ago

Yes I am perfectly aware of the issue presented within the first 4 minutes of that talk. I cringe just about every time I see code like that.

But I disagree that splitAt needs to be concerned with the indices. The way I see it, it is currently doing two entirely different things: checking the indices (and somewhat incorrectly, I believe), and doing the substring algorithm to obtain before/after. The substring does not care if the indices are valid according to the definition within the function.

Substring returns decent results for any integer passed in. So I think the splitAt function should too.

Does data coherence matter if a function is doing two things?

I still don't understand the need for index checking, particularly with splitAt.

Is it to check user input to see if they have entered a valid position? Is it to check code generating indices into splitAt? ... but wait, can we not trust the latter, is there not a better way to validate that, and (most importantly) shouldn't splitAt do it's best job regardless of whether the indices happen to fit within a bound?

Is coherence relevant when the algorithm gives just as good results without the conditional? In fact I would argue that better coherence dictates removing the conditional which has no bearing on the validity of the result, and, as I argue, is incorrect, partially since it was separately derived.

Unless you have a nice counterexample, I don't see the utility of index checking within splitAt.


To my eye (I can't say I fully comprehend it), this (one of the few github-searchable examples of using Data.String.splitAt) would be better served by removing the Maybe type (I hope it would also be better understandable then): Mesos.RecordIO.

Furthermore, in the other search results for splitAt, implementations of a corresponding function for sequence types seem to not have a Maybe type, so my proposal would seem to bring the String version in line with them, no?

davidchambers commented 7 years ago

Unless you have a nice counterexample, I don't see the utility of index checking within splitAt.

My concerns were philosophically rather than practically motivated, @NasalMusician. :)

Thank you for making a clear case for Int -> String -> { before :: String, after :: String }. If and when I add splitAt to Sanctuary I'll follow your lead.

garyb commented 6 years ago

Resolved in 0.12 branch

MonoidMusician commented 6 years ago

YAY! https://github.com/MonoidMusician/purescript-textcursor/commit/cb1af12d7393e5432f0e0d3cd6f5ecb77e028fc3#diff-facb2935a69e9cbd28f68551b9c34b6fL35 :smile: