j3-fortran / fortran_proposals

Proposals for the Fortran Standard Committee
178 stars 15 forks source link

Revise SPLIT #195

Closed milancurcic closed 3 years ago

milancurcic commented 3 years ago

I apologize for the delay on this. I lost track of how fast the J3 meeting was approaching.

Hopefully, it should be simple enough proposal to merge and upload.

certik commented 3 years ago

Which other languages or standard libraries have the third form of split?

milancurcic commented 3 years ago

The closest I could find are:

And both are different from the 3rd form of split.

strtok modifies the string in-place and returns a pointer to the next token.

str.find is more akin to the 3rd form of split because it doesn't modify the string and it returns an integer index. Two key differences are that str.find accepts an input argument start that is not modified in-place like pos, and that you can search for a substring rather than a single character from a set of delimiters.

certik commented 3 years ago

The arguments I saw at the J3 mailinglists are good arguments for keeping the 3rd. But it worries me that Fortran would be the only language having it, which leaves a strong suspicion in me that it might not be that useful as people might think.

milancurcic commented 3 years ago

Well at least C, C++, and Python have it, but the APIs are different.

I think it would be useful, but it has a misleading name. I think Python got it right. str.find tells you exactly what it is.

And Fortran intrinsics INDEX and SCAN are similar to the 3rd form of string.

klausler commented 3 years ago

Can you articulate a compelling reason for why these new capabilities have to be intrinsic in the standard language rather than packaged in a standard library? Something like "the interface can't be specified in Fortran" or "this can't be implemented efficiently in Fortran" or "only a compiler can do this special processing" would be nice.

certik commented 3 years ago

@klausler exactly. I can't articulate it and I would prefer if such functionality was first present in a standard library, such as stdlib, or simply as a separate package first, as Milan has implemented here:

https://github.com/milancurcic/fortran202x_split

and get some usage of it. That is how we discovered the issue with the 3rd form of split.

everythingfunctional commented 3 years ago

Since many on the mailing list seem to want to keep the third form, an option for a different name might be NEXT_SEPARATOR. I could also be convinced that the first two forms should be named TOKENIZE. But I'm also partly just trying to reserve the name SPLIT so I can write one with the interface that makes sense to me.

interface
  pure function split(string, separators) result(strings)
    type(varying_string), intent(in) :: string
    type(varying_string), intent(in) :: separators(:)
    type(varying_string), allocatable :: strings(:)
  end function
end interface
milancurcic commented 3 years ago

This paper was taken over by @richardbleikamp and discussed at the meeting 223.

TL;DR:

Here's the paper that passed.

certik commented 3 years ago

@milancurcic awesome, thanks for pushing this. Do you want to update your "reference" implementation with the few changes?

milancurcic commented 3 years ago

I will, very soon, and still have yet to add tests and work out a kink in the SPLIT (old 3rd form) implementation. Once there, we'll submit a PR to stdlib.