Scala IO fix-up/overhaul

scala / slip

obsolete — archival use only

67 stars 15 forks source link

Scala IO fix-up/overhaul #19

Closed dickwall closed 7 years ago

dickwall commented 9 years ago

scala.io.Source is small, useful, troubled and usually recommended against although still used by many.

A recent SLIP submission: https://github.com/scala/slip/pull/2 suggested a Target cf to Source for similar functionality on the output. Feeling in the SLIP committee is that a Target that aimed to be the equivalent for output as Source is for input as it stands now would not be accepted into the core libraries, however, everyone seemed in favor of an overhaul of the scala.io library.

Since this is likely to be a bigger task, we suggest an expert group form and meet to discuss and work on the problem. Interested parties identified in the meeting include Omid Bakhshandeh @omidb, Jon Pretty @propensive, Jesse Eichar @jesseeichar, Haoyi Li @lihaoyi and Pathikrit Bhowmick @pathikrit. The expert group will, of course, be open to volunteers willing to work on the implementation (if you are just interested in sharing your opinions, I suggest you attach comments to this thread rather than joining the EG).

In order to get things moving, and since the original PR came from @omidb, I suggest he take the lead in forming the group and setting up the first meeting. If at that point someone else wants to volunteer to take the organizational role for the group at that time, that would be the time to discuss it.

Please also note that any IO SLIP targeting Scala 2.12+ will have java's NIO guaranteed to be available, making NIO an option for the basis of an implementation.

First steps:

Please organize the first expert group meeting and provide details of the decisions made and action items. Would suggest following the Either expert group's lead and holding the discussion in the open on Google hangouts-on-air or similar so that the recording is publicly available to all interested. If you are involved with the EG, please post any progress in comments on this issue.

lihaoyi commented 9 years ago

For what it's worth, the conversation on this issue is exactly what I had hoped would happen.

Does that mean that @dickwall's original post was a cunning trick sufficiently wrong to get people to respond? =D

Ichoran commented 9 years ago

I think that basic file IO is such a core functionality that leaving it neglected or missing in favor of community libraries is a disservice. Imagine if we had Option supplied by a community library. Yes, you'd have the "benefit" of being able to pick one that required pattern matching in cases where danger was possible (e.g. no get), a bare-bones set of methods or a really rich one, and so on.

Except there would be one enormous cost which would dwarf all of these: you would continually spend attention negotiating between all the different Option types. You'd have to use them differently, convert them, etc. etc..

Reading stuff from files is a really basic functionality, and I don't buy for a second that Java is where it needs to be from a usability perspective. Let's look at how to read a file into separate lines in Python:

xs = open('my-file.name').readlines()

That's it. Except Python drops a nasty surprise on you: it doesn't necessarily close the file handle right away. And if something goes wrong (file isn't there, for example), boom! But if you compare that to the best Java nio version I know of:

val xs = java.nio.file.Files.readAllLines(java.nio.file.Paths.get("my-file.name"))

Um, ow? And it avoids the open file handle problem, but it still throws an exception if something goes wrong.

There is really something wrong with this picture. Maybe you want to stream your file, but maybe you just want to grab it. Why is this so hard? It hardly matters which easy and correctness-enhancing approach we adopt as long as we settle on one!

Here's Ruby:

xs = IO.readlines("my-file.name")

If you're going to fall on your face when reading doesn't work, why require any more than that?

If you don't want to fall on your face, then shouldn't it look something like

val xs = "my-file.name".file.slurp

where the return type is an Either or equivalent? That's what my personal IO library does, and others' are approximately the same (e.g. Ammonite, rapture.io, etc.).

I absolutely don't buy that we can't provide something way better than what we've got if we care to give it enough attention. And I also don't buy that it has to be perfect, or that we can't tell what is good enough without years of watching which libraries are used more.

Files are just really basic things that computers have to deal with. So let's get it rightish and save a whole bunch of people a whole bunch of pointless busywork and/or mistakes.

I am not sure that a lightly decorated Java nio is the way to go. It might be, but we should have a hard thought about error handling before we say that's what we want. Making it easy to deal with errors is really important too. (Arguably even more important than making it easy to get yourself in all kinds of troubles by not handling errors.)

lihaoyi commented 9 years ago

Except there would be one enormous cost which would dwarf all of these: you would continually spend attention negotiating between all the different Option types. You'd have to use them differently, convert them, etc. etc.. Reading stuff from files is a really basic functionality, and I don't buy for a second that Java is where it needs to be from a usability perspective.

I agree 100%

Um, ow?

Well then just add import java.nio.file._ into the predef and you're down to 50% more verbose than Python. Add a single implicit from String => java.nio.file.Path and you're equally concise. If verbosity is the only issue, we should just do that and declare victory immediately.

Maybe you want to stream your file, but maybe you just want to grab it. Why is this so hard?

I don't know, but clearly scalax.io died a slow death. Why did it die? It's worth doing a rough post-mortem before we make all the same mistakes all over again 4 years later.

I've tried to do this correctly in Ammonite, and it's really difficult. For example, there's no way to do streaming file IO with proper close-after-use guarantees in the current Scala collections framework. Iterator (what Ammonite uses) doesn't actually work because if you don't use up the iterator, you leak file handles, which is subtle and hard to debug. Stream doesn't work for the same reason. Traversable doesn't work because by default those are strict on all operations. All other collections (Seq, Vector, etc.) are of course eager.

It turns out that the only way to do this is to implement a new kind of lazy-traversable-view thing that has basically never been seen before in the Scala collections library. It kinda-sorta exists in collections.views, but those are on their way out, and Java 8 streams have something similar. At least that's as much as I got from the scala-internals gitter room, but presumably those people know what they're talking about.

Either way, that probably qualifies as "hard"

If you don't want to fall on your face, then shouldn't it look something like

I don't see how making things extension methods rather than normal methods affects error handling semantics at all. It's literally just shufflling tokens around.

That's what my personal IO library does, and others' are approximately the same (e.g. Ammonite, rapture.io, etc.).

Ammonite throws java.nio.FIleNoteFoundExceptions. It works great. If you want a monad you wrap it in a Try.

I don't know what rapture does. On the other hand, I would not count zero-user libraries like Ammonite and Rapture as "model answers" w.r.t. what we should do. We probably should be looking at really-used-for-real-things libraries to see what people really use: sbt.io. scala.tools.nsc.io.file, java.nio, apache-commons, and friends. Almost all of them throw exceptions (A few methods here and there return Options)

One question I would ask is: what are we solving? If we're not solving validation, we shouldn't come up with our own fancy validation monad. If we just want to let people read and write their damn files, throwing exceptions fits into how every single other language in the world does it.

I absolutely don't buy that we can't provide something way better than what we've got if we care to give it enough attention. And I also don't buy that it has to be perfect, or that we can't tell what is good enough without years of watching which libraries are used more.

I'd believe you if scalax.io didn't fail miserably. I don't know the answer but clearly the "We should just put effort into it and make it work" didn't help scalax.io even after 400 commits of effort.

It might be, but we should have a hard thought about error handling before we say that's what we want. Making it easy to deal with errors is really important too. (Arguably even more important than making it easy to get yourself in all kinds of troubles by not handling errors.)

This, alone, is it's own huge project: a convenient, flexible validation/error-handling data-type in the standard library. This is completely orthogonal to file IO: sure it's great if we have both, but either on their own is extremely valuable, and splitting them off into their own projects de-risks them considerable.

lihaoyi commented 9 years ago

Here's a few questions that really should be asked before we run off to organize an "expert group" to "overhaul" the io library, and certainly should be answered before we start discussing nitty-gritty API details like whether to use functions or extension methods, or whether to work with exceptions or Eithers:

How long should this take? 1 month? 6 months? 12 months? 36 months? If we're throwing something in now we should probably stop talking, write something passable and land it. But if we have a few months that's enough time to work with some existing friendly project to try and port them onto our API as a POC, or put it on maven central for a while for people to try out before fossilizing it in the std lib.
Why did all the other projects in the past fail? Why does almost nobody use rapture.io? Why does nobody speak about scalax.io except in confusion whether it's alive or dead? Why did scala-arm die off? I don't have the answers to these, but presumably if we don't want this project to die it's worth finding out. Post-mortems take a bit of time but not as much time as 3 years and 400 more commits
Assuming we botch the whole thing, what's our strategy to realize that as early as possible (i.e. not after 3 years and 400 commits), and with as little damage as possible (i.e. not leaving things like sys.process lying around the std lib)? This probably rules out "working on own our awesome code in our own awesome github repo forever" or "YOLO landing stuff in master".
What is this library meant to do anyway? If it's IO, does that include sockets and HTTP like Rapture does? If it's File IO, does it include non-read/write filesystem management like better-files or ammonite-ops does? Does it include "in-memory IO" like working with InputStreams and OutputStreams? Does it work with text only, or binary data, or both? Streaming API or batch API or both?
Are we going for convenience (e.g. open("file.txt").read()) or shared-interfaces (Source, Target, ...) in the API? These are both valuable, but totally orthogonal. Having both is great but either alone is already useful. From the posts so far, some people want one and some people want the other.
Are we sure it's worth putting in all this effort to avoid java.nio, when we could just add java.nio.file.Files and 2-3 implicits to Predef.scala, and be able to leverage the non-trivial amount of documentation and familiarity out in the community w.r.t. how to use NIO? v.s. having to re-document and re-educate everyone ourselves if we make our own API, in addition to making sure our API is sufficiently cohesive and consistent and correct. Maybe we decide enough people are running Scala.js/Node.js to make our own API worthwhile, or the Oracle Legal Risk is too great. Or maybe we decide using Java APIs is just fine.

odersky commented 9 years ago

I completely agree with @ichoran and others that we should make an effort to get a decent io library. The fact that other languages have them shows that the problems are not unsurmountable. Why did scalax.io "fail"? Precisely because it was NOT picked up by an expert group like the one which is forming here. It, (and rapture, and scala-arm) were by and large single-author projects. Sure there was some help, but there was no community wide backing. So authors at some point had different projects on their plate and walked away.

I also like the spirit not to be too ambitious. Let's wrap nio, and let's try to get to the level of conciseness we are used to in Scala; that could very well be all that needs to be done.

Since other languages have them,

On Mon, Sep 21, 2015 at 9:23 AM, Li Haoyi notifications@github.com wrote:

Here's a few questions that really should be asked before we run off to organize an "expert group" to "overhaul" the io library, and certainly should be answered before we start discussing nitty-gritty API details like whether to use functions or extension methods, or whether to work with exceptions or Eithers:

-

How long should this take? 1 month? 6 months? 12 months? 36 months? If we're throwing something in now we should probably stop talking, write something passable and land it. But if we have a few months that's enough time to work with some existing friendly project to try and port them onto our API as a POC, or put it on maven central for a while for people to try out before fossilizing it in the std lib.

Why did all the other projects in the past fail? Why does almost nobody use rapture.io? Why does nobody speak about scalax.io except in confusion whether it's alive or dead? Why did scala-arm die off? I don't have the answers to these, but presumably if we don't want this project to die it's worth finding out. Post-mortems take a bit of time but not as much time as 3 years and 400 more commits

Assuming we botch the whole thing, what's our strategy to realize that as early as possible (i.e. not after 3 years and 400 commits), and with as little damage as possible (i.e. not leaving things like sys.process lying around the std lib)? This probably rules out "working on own our awesome code in our own awesome github repo forever" or "YOLO landing stuff in master".

What is this library meant to do anyway? If it's IO, does that include sockets and HTTP like Rapture does? If it's File IO, does it include non-read/write filesystem management like better-files or ammonite-ops does? Does it include "in-memory IO" like working with InputStreams and OutputStreams? Does it work with text only, or binary data, or both? Streaming API or batch API or both?

Are we going for convenience (e.g. open("file.txt").read()) or shared-interfaces (Source, Target, ...) in the API? These are both valuable, but totally orthogonal. Having both is great but either alone is already useful. From the posts so far, some people want one and some people want the other.

Are we sure it's worth putting in all this effort to avoid java.nio, when we could just add java.nio.file.Files and 2-3 implicits to Predef.scala, and be able to leverage the non-trivial amount of documentation and familiarity out in the community w.r.t. how to use NIO? v.s. having to re-document and re-educate everyone ourselves if we make our own API, in addition to making sure our API is sufficiently cohesive and consistent and correct. Maybe we decide enough people are running Scala.js/Node.js to make our own API worthwhile, or the Oracle Legal Risk is too great. Or maybe we decide using Java APIs is just fine.

— Reply to this email directly or view it on GitHub https://github.com/scala/slip/issues/19#issuecomment-141897103.

Martin Odersky EPFL and Typesafe

Ichoran commented 9 years ago

@lihaoyi - Verbosity is not the only issue but it is an issue.

And having something not in the standard library raises the activation energy by manyfold over having it there. That's the main reason why scalax.io failed me. I didn't even know about it for most of its life, and when I am thinking, "Gosh, { val x = io.Source.fromFile("myFile.txt"); try { x.getLines.toVector } finally { x.close } } is kinda annoying to type, but oh well," the solution that jumps to mind is not "Let's add another SBT depedency (maybe up from zero, and I am not even using SBT for this quick thing in the REPL), and try to remember what a LongTraversable is and when and whether the file gets closed". It's always been easier to use Java io / nio which is effectively in the standard library than some other thing that isn't.

The problem with adding external dependencies is that they break, they lose binary compatibility, they get abandoned. They lack all the nice stability and compatibility that the standard has (at considerable effort). This can be a big deal.

I think @lihaoyi's questions are good ones to answer, but I think they all have pretty straightforward answers.

It would be nice to have in 2.12 and definitely should be in 2.13, so 6-18 months.
External projects failed in the past largely because they were external.
There is no scenario where botching this would not be painful. The time to realize we're botching it is before we ever release anything, and we can do it by having the entire expert group use whatever is created for a bunch of their own I/O tasks. If they can't stomach it, it's botched.
The goal should be to do as little as possible to save people as much time as possible. Since diversity makes things harder to code and use, that probably means it's standard files only (i.e. you know how big they are, can random access, etc.).
Shared interfaces are a form of convenience because you have less to remember. It's almost all about convenience. So: convenience.
When were we sure we needed to avoid nio? The mistake, I think, is to say "We don't need to do annything because import java.nio._." @pathikrit has been saying that there's a lot to do just by enriching nio. That's fine with me (if it meets our goals)! I just don't think un-enriched nio is where to leave things, especially if we feel that we need to deprecate io.Source.

Another really important questionto ask/answer is

Why do people use io.Source?

Why do I use io.Source? Because the convenience of it being there and pretty much working outweighs the annoyance of having to work around its issues (leaking resources, terrible performance if you use its primary char-iterator interface, no error handling).

I grant that this isn't easy, or we wouldn't be saving people time/effort/bugs by doing it for them. But I also don't think it's that hard, as long as we resist the temptation to create a glorious overarching framework that solves all IO problems forever.

Finally, the core library in Scala should have some sort of validation, I think. IO can assume that will exist if that is a make-or-break feature.

lihaoyi commented 9 years ago

but I think they all have pretty straightforward answers.

Yeah, I didn't say they were hard, but I thought it's worth bringing up since nobody was talking about them =D

It would be nice to have in 2.12 and definitely should be in 2.13, so 6-18 months.

Agreed. I'd argue we should target 2.12, which would make the target something like 6 months. I doubt we're gonna have validation in place by that point. It also bounds it tightly so we don't go off on some vision quest.

External projects failed in the past largely because they were external.

I'd argue that at least part of the failure was scope-creep. e.g. scalax.io never had a point where it could be considered "done", and so it just kept going, and going, until people lost interest.

He-Pin commented 9 years ago

would you mind take a look at the io part of elixir,File,IO,Path?

SethTisue commented 9 years ago

The range of possible outcomes here isn't limited to "add to standard library in the usual way" and "failure".

We might also end up with something we don't yet have fixed terminology and guidelines for — something like a "standard module", a module that that is "blessed/promoted/recommended by typesafe/scala/@odersky" (to quote pathikrit) but separately packaged and versioned, not just piled into scala-library.jar. (We started discussing this at the last SIP/SLIP meeting and will continue doing so. SMIPs, anyone?)

Existing "standard modules" like scala-xml and scala-parser-combinators have checkered histories and are moving away from core rather than towards it, but that's an accident of history that needn't curse new modules. And, perhaps the REPL, sbt, etc could use some changes to make such modules easier to find and use? Ammonite's load.ivy is a good experiment in this space.)

Or, we might end up with something that remains a completely third-party project for now, but consolidates existing efforts and becomes a popular de facto standard; we shouldn't pre-judge that as being "failure".

SethTisue commented 9 years ago

Opinions, mostly echoing things that have already been said:

Like Rex and Lukas and others, I think the standard library should include decent minimal support for basic stuff newcomers want to do. Having that doesn't prevent a healthy ecosystem of competing libraries from developing. Users wanting something different or better will are free to ignore stdlib stuff.
Denys's point about Scala.JS is important. It's just one reason we want basic functionality to be built-in, but it's often forgotten.
The discussion so far shows we're all well aware of the dangers of attempting something too ambitious. The watchwords here are definitely "minimal" and "decoupled". Lukas' history lesson reminds us how circumstances have changed; we should learn from past mistakes, but shouldn't be paralyzed by them. In the intervening years, Typesafe/EPFL/contributors have done a great deal to modularize and/or fix and/or deprecate old stdlib stuff; let's keep doing that. (Many such improvements are small and don't require SLIPs.)
Waiting for third party libraries to develop and then just picking the best one for stdlib won't always work. Third party libraries that become popular are usually too big to be in stdlib; whereas small ones, the kind we want for stdlib, tend not to get noticed and adopted, because paradoxically they don't offer enough, so people don't seek them out, especially beginners who need them most but wouldn't even know where to look. So we can't just look at popularity. Also, third party library authors are often motivated in part by a desire to experiment and break new ground; but for stdlib a more cautious and conservative spirit is needed.

ghost commented 9 years ago

My 2 cents is my expected "Please make sure Scala.js is included in the requirements and solution".

And the fact that it has a good nio implementation already might add a bit of weight to the "Let's wrap nio" argument.

ghost commented 9 years ago

Another point from the JS POV, is gentle reminder that JS does not actually support Files and the like. The previous reference to node.js is a node specific extension. In a similar way, HTML 5 has a File API.

(Disclaimer... I'm more than a little bit interested in the last one, and it does have some areas that are well out of scope here.)

What this means (I think :wink:) is that the previous comments about the scope of this SLIP as being IO in general, or just File specific are not just purely academic. As an example, see Scala.js's PrintWriter where some File constructors are provided, that will not link, but provided "just in case a third-party library on the classpath implements those"

But, rather than confuse matters, the scala.js differences may pave the way to a simple process:

We have some links to other libraries where IO/file-io is done reasonably - the next question (as per a type class implementation) could be "What's the smallest set of base functions needed to implement all of these API's." Then everything else is just syntax that call these base functions.

It's then easy for a third party library - that in the case of node.js and HTML 5 additions have to be third party- to add these core functions and the rest just works.

And of course other libraries that provide other syntax, be it Bash-like or SBT-like, really are just sugar to the std library. So given this, I guess the final debate could "simply" focus on what the default syntax in std-lb should be

lihaoyi commented 9 years ago

Another point from the JS POV, is gentle reminder that JS does not actually support Files and the like.

If we want an API that is used for conveniently read/write files, I think we should ignore Scala.js for now.

Sure, it would be cool theoretically to have a cross-platform API, and I've written more cross-platform APIs than anybody, but, who is it that is using Scala.js on Node that would actually benefit from this?

I don't mean "someone might", but actual people. Because I only know of one person who spent 8 hours getting Scala.js working on Node.js for a lark and that's it.

Unless, of course, we decide we want to provide abstract interfaces instead of concrete utilities. In that case having the interfaces generic enough to plug in more things later would be nice, but I feel that would be a bit pre-mature right now

So given this, I guess the final debate could "simply" focus on what the default syntax in std-lb should be

Assuming we already know what semantics we want to be able to do (read, write, blah) your previous line suggests final debate would be to focus on what the data structures the standard library should provide. Are we gonna be passing around:

java.io.Files?
java.lang.Strings?
java.nio.files.Paths?
Ammonite's explicitly-relative/absolute always-canonicalized ammonite.ops.{Path, RelPath}s?
better.files.File?
Something else?

Presumably having consistent data-structure would be much more important for interop than default syntax, since this will be what's appearing in everyone's function/variable signatures.

ghost commented 9 years ago

Node.js is the recommended VM for scala.js tests so almost everyone that tests scala.js code uses Node.js.

See:

https://github.com/banana-rdf/banana-rdf/issues/175 is an issue relating to reading filesystem files from scala.js tests. Not mentioned there is the fact that we would also want to write to persist graphs locally.
repeated in rapture gitter https://gitter.im/propensive/rapture?at=55b0bcc9145c42fe657e5036

ghost commented 9 years ago

Something else?

If we "Let's wrap nio" it could be called scala.io.File

ghost commented 9 years ago

Just to clarify my scala.js point, I'm not suggesting a fully blown cross API - rather just enough in scala.js (ie stubs as mentioned before) that can easily be implemented by a third party.

pathikrit commented 9 years ago

Something else?

What about a scala.io.Path or a scala.io.File which simply wraps java.nio.files.Path (that's what better.files.File does). Paths are the more correct term here than files IMO but developers usually think about files and not paths so its a matter of nomenclature..

I really like ammonite's distinction between relative vs absolute paths (makes certain operations safer) but it may violate "let's not introduce any type-hierarchy"?

Similarly for files, I grappled with type-safety e.g. should you be able to call .list on a regular file or call .readBytes on a directory? Should we have type to help our code be safer? e.g.

File("/tmp/foo") match {
  case d: Directory => d.list()
  case f: RegularFile => f.readBytes
  case SymbolicLink(d: Directory) => d.list()
  case _ =>  // something else e.g. UNIX pipes/processes/devices etc
}

If our goal is to simply wrap NIO, I would say no and let those additional type-safety be provided by external libraries like ammonite (type-safe paths) and better-files (type-safe files).

Also, if we go down the path of "let's wrap NIO", how exception-happy should we be? The Java NIO directory.walk() for example throws errors if one of the files in the directory is unreadable. Should we tolerate that in Scala?

lihaoyi commented 9 years ago

how exception-happy should we be?

I think we should throw exceptions willy-nilly. Exceptions are great, well understood, familiar, and can trivially be wrapped in more principled abstractions via try-catch. The scala standard library only has Try, which I think isn't that appropriate, and further research into fancier-while-still-usable abstractions are still just abtractions.

The problem with files is that they're halfway between statically-known and unknown. e.g. if I'm dealing with files I know on disk, and can see them in front of me and know what they are, having everything return Options would just make me call .get everywhere

bs76 commented 9 years ago

Here are my 2cents:

IO is not just files; resource state management is completely missing from io.* and that is an obstacle; using Try, where do you close resources ? Combining reads/writes on multiple file resources your code becomes a complete mess. I wrote withResources so many times, it's not even funny;
IO is about resources, there needs to be a clear way to manage them, and handle errors; 'files I clearly see' do not exist; networks fail etc.
io.Source class is harmless and marginably usable; to read in a file in 'one line' is good enough
a DSL on top of files will always fail and never be done right. It's point-of-view matter. Where as sometimes OO approach fits, some might prefer pipes and combinators approach
files/paths are complex: there are paths (with/without files), files may have paths (virtual,logical,physical), there are links (physical,logical) and all of that on top of an OS;

Here's what I would suggest:

leave Source.io as is for now, do not deprecate
take java.nio/java.io and pimp it to make it better usable e.g. InputStream / Reader to read a String, convert to (Seq ?)
make opening files simpler, with pimed java.io classes manipulation will be simpler
add resource management into io. and provide style guidelines how to manage resource safely to be on par with java's try(Closeable ...)
pimp java.nio.Path to be more usable
let 3rd party libraries extend and build on top of the API, adopt usable abstractions

som-snytt commented 9 years ago

At least we now know when a project has run out of steam: "Aligned Scala logo." https://github.com/scala-incubator/scala-io/commit/8b5467d66760536d34b6bcb36f69a1b7f67f68b5

By coincidence, I'm aligning the logos on my desk this very minute.

ghost commented 9 years ago

There was a discussion (and earlier) and my suggestion that we could use this issue as a test-bed on separating the std-lib interface and implementation and see how naming conventions work etc. But this need not be part of the final solution.

By naming, for example, should an implementation have the std namespace and/or its own:

import scala.io          // std lib import, as defined in a library dependency in SBT
import scala.std.io    // the scalac implementation
import nodejs.std.io  // My own node version

dickwall commented 9 years ago

Oops - didn't mean to edit Haoyi's post but reply to him - reply is below (with context)

Just catching up with this very long thread now - I was on vacation so sue me :-)

For what it's worth, the conversation on this issue is exactly what I had hoped would happen.

Does that mean that @dickwall's original post was a cunning trick sufficiently wrong to get people to respond? =D

Not a cunning trick, but it certainly has led to a healthy discussion. The original post offers some options but certainly makes no demands or assumptions on what the EG should decide. My only request is that such discussions are held in the open (which this one seems to be)

Just for the record, I am hands off any decision making on the technical side because I don't believe I can attempt to get a working process bootstrapped and also influence the decisions made within that process without a huge conflict of interest. As @non points out, formation of an expert group says nothing about whether an IO library should be forthcoming, only that the discussion should occur. The original post closes with:

Please organize the first expert group meeting and provide details of the decisions made and action items. Would suggest following the Either expert group's lead and holding the discussion in the open on Google hangouts-on-air or similar so that the recording is publicly available to all interested. If you are involved with the EG, please post any progress in comments on this issue.

If the EG decides no action is the correct action, aside from being very Zen, then that is what the EG decides. The discussion here is obviously healthy, but the point I am trying to get across is that right now we are trying to get the process bootstrapped (certainly that's my aim) not to influence anything about the outcome.

That said, I am looking forward to the time when the process is trusted better and I can actually get involved in working on the opinion side of things as well.

For now I am being as hands off and objective as I know how.

Getting the word out about EGs and the messaging around the process is still something that I am very much interested in. How can we improve the messaging so that people are less surprised when issues like this come up (I don't always have time to email everyone individually so we need to find a common place where the message gets out there without being too surprising to people).

dickwall commented 9 years ago

Tomorrow (Monday 12th) being the next SLIP committee meeting, any updates or summaries to add for this issue? Thanks

dickwall commented 9 years ago

Re-reading this thread prior to the meeting tomorrow, the most insightful posting is probably this one:

Here's a few questions that really should be asked before we run off to organize an "expert group" to "overhaul" the io library, and certainly should be answered before we start discussing nitty-gritty API details like whether to use functions or extension methods, or whether to work with exceptions or Eithers:

How long should this take? 1 month? 6 months? 12 months? 36 months? If we're throwing something in now we should probably stop talking, write something passable and land it. But if we have a few months that's enough time to work with some existing friendly project to try and port them onto our API as a POC, or put it on maven central for a while for people to try out before fossilizing it in the std lib.

Why did all the other projects in the past fail? Why does almost nobody use rapture.io? Why does nobody speak about scalax.io except in confusion whether it's alive or dead? Why did scala-arm die off? I don't have the answers to these, but presumably if we don't want this project to die it's worth finding out. Post-mortems take a bit of time but not as much time as 3 years and 400 more commits

Assuming we botch the whole thing, what's our strategy to realize that as early as possible (i.e. not after 3 years and 400 commits), and with as little damage as possible (i.e. not leaving things like sys.process lying around the std lib)? This probably rules out "working on own our awesome code in our own awesome github repo forever" or "YOLO landing stuff in master".

What is this library meant to do anyway? If it's IO, does that include sockets and HTTP like Rapture does? If it's File IO, does it include non-read/write filesystem management like better-files or ammonite-ops does? Does it include "in-memory IO" like working with InputStreams and OutputStreams? Does it work with text only, or binary data, or both? Streaming API or batch API or both?

Are we going for convenience (e.g. open("file.txt").read()) or shared-interfaces (Source, Target, ...) in the API? These are both valuable, but totally orthogonal. Having both is great but either alone is already useful. From the posts so far, some people want one and some people want the other.

Are we sure it's worth putting in all this effort to avoid java.nio, when we could just add java.nio.file.Files and 2-3 implicits to Predef.scala, and be able to leverage the non-trivial amount of documentation and familiarity out in the community w.r.t. how to use NIO? v.s. having to re-document and re-educate everyone ourselves if we make our own API, in addition to making sure our API is sufficiently cohesive and consistent and correct. Maybe we decide enough people are running Scala.js/Node.js to make our own API worthwhile, or the Oracle Legal Risk is too great. Or maybe we decide using Java APIs is just fine.

I agree with this set of questions/priorities 100%, the only difference I have is that why can't the expert group itself answer these? They are, after all, going to be affected by the answers. I think there is some misunderstanding of what an expert group is (or can be). Answering these questions would appear to be an ideal starting point for the group, and that group has full power and responsibility to chose as they see fit. I certainly can't think of any better choice of people to ponder these than the people that have an interest in the IO library.

Also please note that being suggested for involvement in an EG does not mean you have to volunteer, nor does it limit the potential membership. It is instead merely a way to notify potentially interested parties that such a thing is being considered.

I will be writing up a blog post for the Scala blog about some of these concepts in the near future.

pathikrit commented 9 years ago

Thanks for the summary @dickwall. Regarding this:

I agree with this set of questions/priorities 100%, the only difference I have is that why can't the expert group itself answer these?

Can we choose based on "what is the least amount of work we can do for the maximum benefit to the programmer"? To maximize "bang for the buck", wrapping all the utils in java.nio.file.Files into a sensible Scala File class makes the most sense (proof of concept).

There are also valid concerns about the standard lib being the "graveyard of code" - IMO, this mitigates some of those concerns. Less code we put in the std lib, the less we put in the graveyard :)

lihaoyi commented 9 years ago

Can we choose based on "what is the least amount of work we can do for the maximum benefit to the programmer"? To maximize "bang for the buck", wrapping all the utils in java.nio.file.Files into a sensible Scala File class makes the most sense (proof of concept).

IMHO you can get very far with a lot less bucks

implicit def stringPaths(p: String) = java.nio.file.Paths.get(p)
implicit def stringPaths(p: java.io.File) = java.nio.file.Paths.get(p.toString)

Here we're paying two lines of code instead of 150 in your POC. I don't think we really get 75x more value out of wrapping things v.s. just using the methods directly. I mean, is it really worth spending 148 lines of code wrapping every single operation in our own definition, just so we can call f.delete() instead of Files.delete(f)? Especially given any Java programming will already be 100% familiar with the latter.

pathikrit commented 9 years ago

@lihaoyi I disagree :) We absolutely need to wrap java.nio.file.Files.

just so we can call f.delete() instead of Files.delete(f) ?

java.nio.file.Files has devious traps for us if we are not careful e.g. Files.delete does not actually delete non-empty directories - you have to do that yourself (you get a nice DirectoryNotEmptyException otherwise during run-time). Sure, any self-respecting Scala programmer can recurse and delete a directory in her sleep but try that with Files.copy which cannot copy directories recursively (it silently makes a empty folder with that name) and to do that correctly is entirely non-obvious. Similarly, Files.move - you have to be careful when the target exists and Files.size is not that useful for directories where you may want to calculate the size of the directory rather than the size of the inode entry. Something simple like chown should have been file.setOwner(owner) - instead you have to write something ridiculous like: Files.setOwner(path, path.getFileSystem.getUserPrincipalLookupService.lookupPrincipalByName(owner))

Do you want to count lines in a file using Java NIO? Files.lines(myFile).size seems pretty innocuous but it is not! Files.lines returns a java.util.Stream which needs to be closed!

Why would we burden Scala programmers with all these pitfalls or make them waste their time looking up on StackOverflow how to do trivial things like get an Iterator[Char] from a file when we can sanely wrap java.nio.file.Files? Sure, many of them would be 1-liner hand-offs; but, in other cases, we can make life a lot better with few extra lines of code around whatever Java gives us to smoothen the rough edges of java.nio.file.Files.

jeantil commented 9 years ago

As a user of the better.files library I strongly support @pathikrit 's position. This library is a huge relief when having to do filesystem operation. I don't really care if it's included in the std lib or not but it is definitely much better than anything that's currently available in either java or scala standard libraries.

jsuereth commented 9 years ago

@pathikrit I'm surprised you forgot to mention that on windows sometimes you can't delete a file immediately (because something like a virus scanner holds it), so to be 'safe' you actually need to call delete multiple times with some kind of time-out/retry. We have most of this in the sbt.IO class as well, and I agree it's basically a necessity for those not writing really low-latency/low-level file code who just want it to "work".

However, I'd argue that for a general-purpose standard-library file API, I'm not 100% certain all the "correctness vs. speed" tradeoffs should be made for me. I can totally see this from a utility library.

pathikrit commented 9 years ago

@jsuereth : Good point about drawing a line between a "util" library and a std library API. IMO, if you want low-level, run with scissors APIs, we already have the java.nio.file in the std lib. The Scala one should not even pretend to be a replacement for that and make that abundantly clear in the docs. Instead, it should strive to be the more intuitive and pragmatic "util" wrapper around the former.

mdedetrich commented 9 years ago

My standard take on this

We generally need to start looking at doing stdlib implementations in pure Scala, rather than doing light wrappers over the Java versions
File IO is something that is basically a must have which needs to be standardised, there should be a proper standardised idiomatic scala implementation that isn't just a java.nio.file
This means stuff like async file IO, should ideally be returning stuff like Future[File]

I am also in favour of doing a proper, clean room implementation. The current state of file IO in Scala is a mess, and everyone is using a combination of java.io/java.nio/scala.io/Source and then stuff like https://github.com/pathikrit/better-files. Stuff like Scala.js (and future backends that may come as a result of dotty/TASTY, such as LLVM) really scream for Scala idiomatic implementations of stdlib, rather than falling to back to Java all the time

In terms of design, I am happy with stuff like better-files, with additions to using stuff like Future[File] with proper async IO.

This puts us in a good position to create a new package (under a different names).

I also completely agree with @pathikrit, we need to properly wrap all of the java.nio since there are so many corner cases when doing file IO for the reasons he stated

dickwall commented 8 years ago

One week to the next SLIP meeting. Not that I want these things to just become SLIP meeting driven (in terms of dates/deadlines), but if there are any updates on this issue in the next week, we will pick them up in that meeting.

velvia commented 8 years ago

+1 to everything that @mdedetrich said. A clean room implementation (more for clean, idiomatic Scala API perspective) would provide the greatest return in the long run, esp w.r.t. Scala.js etc. Plus that File I/O is something people expect in a standard library....

lihaoyi commented 8 years ago

Looking back, there's a lot of interesting discussion in this thread, but the one thing that's clear to me is that the community failed to come to a consensus. People have differing use cases, requirements, and styles, problem scopes, and it seems doubtful we'll come to a consensus in the foreseeable future.

If we accept that we have not converged on any technical solution, now is the time to start thinking about the meta-solution: given we can't agree or decide, how can we get to a place where we could agree or decide at some point in the future? Even if scala-team/EPFL/soon-to-not-be-called-Typesafe don't bless/pick/write any IO library right-here-right-now, there are things they can do can do that would speed up the process of coming to a decision.

For example, if we decided that

"we'll wait and see who picks up adoption"

They could add links to the docs/tutorials/main-website like

"If you want to do more things with files, here's a list of 6 libraries you could try"

This would funnel new users towards the candidates, so the various libraries all get a steady stream of people vetting them and deciding they like them or not. If we decided the process was

"wait till people send PRs to port PlayFramework+SBT+whatever onto their own IO library, and do code-reviews then to decide which one we like"

Then there would be a different set of actions we could take to smoothen/speed-up that process

This is a reason why an explicit null-decision would be useful, v.s. just not deciding: deciding "we won't pick one now" would let us move on confidently to the next topic of discussion: how would we structure such a selective process and define the ending conditions? How would we make it fair, fast, and hopefully encourage the right kinds of behavior that optimizes for the things we want?

This then becomes a very managerial question, and arguably throwing a bunch of "people who write libraries" together wouldn't be the most effective way to answer it =P

mdedetrich commented 8 years ago

I think the biggest thing to get out of an IO library is to end the confusion, for new users, about what IO to use. @lihaoyi , the talk you gave at Scala By The Bay perfectly demonstrates the problem, to do silly IO stuff, users end up having to search stack overflow. There are around 4-5 solutions, some coming from Java, some coming from stuff like Apache Commons, stuff coming from Scala Source (which some people now accept as not that good of a library), and all are fairly verbose.

Looking back, there's a lot of interesting discussion in this thread, but the one thing that's clear to me is that the community failed to come to a consensus. People have differing use cases, requirements, and styles, problem scopes, and it seems doubtful we'll come to a consensus in the foreseeable future.

The whole "wait for people to use a common IO library" doesn't really hold water, it hasn't happened in some long time. I am sure, for example, that Rapture IO may be a great IO library, I however only found about this a few months ago. The other thing is, that other frameworks/libraries do not use this library, so we then risk ourselves of getting to the perverse situation that landed us with the same problem that we have with JSON

We should have an IO library, where as a new user, I can go to the scala website, and the docs will go something like

import scala.io.File

val f: File = File.open(".someFile")
val asyncF: Future[File] = File.openAsync(".someFile")

And then a bunch of your expected operations. I don't think anyone here is asking for a hyper specialized high performant IO library to be used for load balancers or something along those lines, there will always be a case for community making their own IO libraries for specialized circumstances. I believe the idea is to create an idiomatic, non Java, Scala IO library that the majority of users are happy with

lihaoyi commented 8 years ago

The whole "wait for people to use a common IO library" doesn't really hold water, it hasn't happened in some long time

I don't know why you quoted me because this has nothing to do with what I said =P

I never proposed inaction. Just a step back from the blind, single minded "let's just do something, community!" strategy that clearly hasn't worked.

I mean, it's great that you're so sure you know what to do to fix everything, but clearly lots of people disagree about things. What next? Arguing "This is what we should do, it's so obvious" just goes in circles.

mdedetrich commented 8 years ago

I don't know why you quoted me because this has nothing to do with what I said =P

Sorry if I wasn't clear. I was just confirming your point that "letting the community do it" didn't really work

SethTisue commented 7 years ago

This could be revived under the new Scala Platform Process (http://www.scala-lang.org/blog/2016/11/28/spp.html).