Scala IO fix-up/overhaul

dickwall commented 9 years ago

scala.io.Source is small, useful, troubled and usually recommended against although still used by many.

A recent SLIP submission: https://github.com/scala/slip/pull/2 suggested a Target cf to Source for similar functionality on the output. Feeling in the SLIP committee is that a Target that aimed to be the equivalent for output as Source is for input as it stands now would not be accepted into the core libraries, however, everyone seemed in favor of an overhaul of the scala.io library.

Since this is likely to be a bigger task, we suggest an expert group form and meet to discuss and work on the problem. Interested parties identified in the meeting include Omid Bakhshandeh @omidb, Jon Pretty @propensive, Jesse Eichar @jesseeichar, Haoyi Li @lihaoyi and Pathikrit Bhowmick @pathikrit. The expert group will, of course, be open to volunteers willing to work on the implementation (if you are just interested in sharing your opinions, I suggest you attach comments to this thread rather than joining the EG).

In order to get things moving, and since the original PR came from @omidb, I suggest he take the lead in forming the group and setting up the first meeting. If at that point someone else wants to volunteer to take the organizational role for the group at that time, that would be the time to discuss it.

Please also note that any IO SLIP targeting Scala 2.12+ will have java's NIO guaranteed to be available, making NIO an option for the basis of an implementation.

First steps:

Please organize the first expert group meeting and provide details of the decisions made and action items. Would suggest following the Either expert group's lead and holding the discussion in the open on Google hangouts-on-air or similar so that the recording is publicly available to all interested. If you are involved with the EG, please post any progress in comments on this issue.

dickwall commented 9 years ago

@pathikrit has a NIO library that may be of interest:

https://github.com/pathikrit/better-files

lihaoyi commented 9 years ago

however, everyone seemed in favor of an overhaul of the scala.io library.

What's wrong with "deprecate and point people towards java.nio or third party libraries"? The former is built in and perfectly usable, even from Scala code (as compared to java.io). The latter would be able to evolve much more quickly than something living in the scala std lib, and end up much higher quality.

"The standard library is where code goes to die" isn't it?

Here's one possible alternative: we take some large-ish Scala projects (play? akka? sbt? scalac?) and extract out the common bits of their IO libraries (and they all have their own IO libraries!) into something used by all. We'd need buy in from all the different owners, but that would force us to actually make something of production-quality that is actually getting used. If we make something "cool and elegant" in the vacuum, my $ says it'll be just as useless as scala.io is now.

Here's another alternative workflow: we deprecate scala.io in 2.12, point people towards java.nio or third party libs (better-files, ammonite-ops, etc.) and when one of them becomes popular we then talk about which parts of it are good and are worth including in the standard library. That way we'd know from the fact that it's popular and widely-used that whatever we're including is useful and usable.

I don't think coming at it from a point of view of "let's make an awesome generic powerful IO library with a better Source and a Target and other abstractions..." will yield us any useful results.

He-Pin commented 9 years ago

A better way I think would split it out as scala.io project.where I think we could evolve more fast than it lives in the std one.

omidb commented 9 years ago

The reason that in first place I proposed scala.io.target was that whenever I wanted to do IO, I was using java.io and I thought that for a language like Scala having no IO support is kinda not right. Whenever I want to convince people to use Scala (mostly people from Python) they ask me is it easy to read a CSV file? how about pickle it? How about write it to the disk ..... I think deprecating scala.io can be a good idea but my alternative would be doing the same thing that Scala people did for scala.xml. I don't know what they call it (scala module? plugin?). I think having an IO lib with scala domain would be great.

He-Pin commented 9 years ago

The hard part is what should be in and out in the std,If we provide it via a better separate project eg,scala.nio then we could provide the toolkit start with a minimal one and then keep up coming release for the real user case quickly.

For the file operation,one thing I am using is the vert.x's https://github.com/eclipse/vert.x/blob/master/src/main/java/io/vertx/core/file/FileSystem.java.And I still looked at the https://github.com/google/jimfs.

Look at the way golang ,clojure and rust do,keep some of the module/stdlib out really always helps.I think the core/language should be core and small,scala is a language,but still it lives on JVM.

And I still looked at better-files, ammonite-ops,both them have a shell like syntax,but I don't know how much do them share on the io side.

I think we could improve the scala.io,but If we want to introduce something big or more than better,I think that should happens on a seperate project under scala.

update: for the scala.xml side,it will be depreciated in the future,I think that not like the io one,think about it that,why clojure doesn't put org.clojure.async in the clojure project?

lihaoyi commented 9 years ago

but I don't know how much do them share on the io side.

Both are basically thin wrappers around java.nio. It really isn't bad and does everything you need...

pathikrit commented 9 years ago

Agree with @lihaoyi . Every fairly large project has their own "IOUtils" or "FileUtils" somewhere internally. That would be a good starting point to figure out the core "we-need this util" parts of the library. Or the I/O libraries of Python or Go or F# or node.js might be good to imitate to begin with too...

We can start with a goal of targeting the feature set covered by these 3 APIs:

Guava Files: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/Files.html
Apache IOUtils: https://commons.apache.org/proper/commons- io/apidocs/org/apache/commons/io/FileUtils.html
Jodd FileUtil: http://jodd.org/api/jodd/io/FileUtil.html

But, even before we think about starting on idiomatic AND simple I/O in Scala, we need to answer these:

1) API style: Do we go with a more "FileUtils" style approach. This is what was followed in java.nio e.g. instead of doing file.isDirectory(), you do Files.isDirectory(file). This is needlessly verbose IMO but I don't have a strong opinion here. Do we go with a more OO style (e.g. file1.moveTo(file2)) - that is the style followed in better-files or a DSL inspired by the command line like in ammonite-ops (e.g. mv(file1, file2)) or something else?

2) Is the library centered around Files or Paths? IMO, Paths is the more correct abstraction but also an academic distinction. Most application programmers think about files and do operations on files and for them, files happen to have paths and not the other way (paths happen to have files). I would personally have an immutable set of APIs centered around immutable Paths and a callback-based API based around files (ala node.js).

3) Referential transparency: I/O libraries have inherent side effects:

val file = File(....)
assert(file.exists)
file.delete()
assert(!file.exists)

This is surprising for people coming from a functional/immutable background. Do we go with a more correct immutable API centered around IO monads but increase the barrier to entry for non-fp folks?

4) How do you deal with the myriad of InputStreams and BufferedReaders and FileChannel and OutpustReamWriters that populate the Java enterprise world? Are we going to build sane bridges from Scala to them or do away with all that and have complete Scala equivalents? Here is my attempt at a bridge: https://github.com/pathikrit/better-files#java-interoperability

5) The Java APIs are riddled with things like NotADirectoryException e.g. if you try to list a regular file or read bytes from a directory. This is something I wrestled with in better-files to make I/O operations more type-safe e.g. you cannot call list() on something that is not a directory:

"src"/"test"/"foo" match {
  case SymbolicLink(to) =>          
  case Directory(children) =>       
  case RegularFile(source) =>       
  case other if other.exists() =>   // a file may not be one of the above e.g. UNIX pipes, sockets, devices etc
  case _ =>                         // a file that does not exist
}
// or as extractors on LHS:
val Directory(researchDocs) = home/"Downloads"/"research"

6) Is this library solely intended for disk-based filesystems or can it be a pluggable interface for other filesystems (e.g. S3 or an in-memory one like Google's jimfs)

7) Are the APIs going to all non-reactive blocking ones like we are used to? Can we add reactive APIs like node.js:

file.delete(callback(success, error))

I would recommend, "Why not both?" - let's have both blocking dumb APIs and asynchronous reactive APIs.

Ichoran commented 9 years ago

Let's try to make a distinction between core functionality that almost everyone could use and advanced functionality that will support demanding users. That Scala doesn't have an easy way to slurp up a file is not to our credit. Nor is that we have to choose an external JSON library. These things are ubiquitous needs, and should just be there, and should just work. Easy stuff should be easy.

So, going off of @pathikrit's list:

1) It's easier to have methods on files than to have to drag along a clunky I-can-do-stuff object. file.isDirectory FTW.

2) Inasmuch as Scala favors correctness over other things, Path is going to have to play a major role.

3) Monadic interaction with the file system is an advanced functionality. That belongs in other libraries.

4) Slurping should work with whatever is slurpable. Otherwise, bridges are advanced functionality. That also belongs in other libraries.

5) Type safety that doesn't get in the way and reliably catches all exceptions is a good thing. I don't know what you have in better-files, but if case d: Directory is different than case Directory(children), that's a good start (i.e. you don't throw an uncaught exception on an access error on the pattern matcher). That said, you don't normally want to be futzing with directories too much directly. You want some higher-level thing to happen and directory-walking or searching is a means to that end. We should provide an API that lets you specify your end, not the steps to get there (to the extent possible). File system walkers are a good example of this.

6) Supporting all sorts of weird things that aren't actually mounted as a file system on the OS is beyond the scope of a simple solution. If they look that much like a filesystem, get the OS to mount them as such, and use the normal interface.

7) Reactive APIs are advanced usage. You have to think way more carefully about marshalling resources if you do that. External library.

pathikrit commented 9 years ago

@Ichoran:

I can see 3 parts to this:

1) Core purely Scala OO style APIs centered around: scala.io.Path and scala.io.mutable.File classes. These are all blocking synchronous side-effecty APIs to do "core" things e.g.

 (root / "tmp" / "diary.txt")
  .createIfNotExists()  
  .appendNewLine
  .appendLines("My name is", "Inigo Montoya")
  .moveTo(home / "Documents")
  .renameTo("princess_diary.txt")
  .changeExtensionTo(".md")
  .lines

2) Java converters brought in using scala.io.JavaConverters which can add conversions to/from Java (e.g. https://github.com/pathikrit/better-files#java-interoperability)

3) scala.io.immutable.File - brings in immutable monadic reactive file library. Can be a placeholder for the future.

7) Reactive APIs are advanced usage.

But, even javascript programmers have had them for many years now :panda_face:

non commented 9 years ago

@Ichoran The method names might be too terse, but this little library I wrote seems to hit the sweet spot for me in terms of simplicity/power for reading "regular" files: https://github.com/non/junkion#recipes.

(The library's operating principle is "allow the user to read files without importing anything from java.io or java.nio" and I think it does a reasonable job.)

dwijnand commented 9 years ago

Another one that might be of interest, particularly for the way it fixes Java's API on Windows, is sbt's IO module: https://github.com/sbt/io

On Thu, 17 Sep 2015 at 22:01 Erik Osheim notifications@github.com wrote:

@Ichoran https://github.com/Ichoran The method names might be too terse, but this little library I wrote seems to hit the sweet spot for me in terms of simplicity/power for reading "regular" files: https://github.com/non/junkion

— Reply to this email directly or view it on GitHub https://github.com/scala/slip/issues/19#issuecomment-141227398.

tpolecat commented 9 years ago

I think the odds of getting this "right" in any satisfying sense are very close to zero. So I vote for removing scala.io and pointing users to better options like scalaz-stream, Rapture, Junkion, and so on.

Disclaimer: I want to get rid of almost everything in the Scala standard library.

pathikrit commented 9 years ago

@tpolecat : But then what happens when I want to use lib1 which uses scalaz-stream exposes some method which takes in a scalaz-file and lib2 which uses rapture and exposes a method that uses rapture-file. I now need convert between scalaz-file and rapture-file!

Not sure why you are pessimistic about getting this "right". Many other languages (and libraries) have gotten this "right" enough to make it painless:

https://nodejs.org/api/fs.html

http://ruby-doc.org/stdlib/libdoc/fileutils/rdoc/FileUtils.html

http://www.boost.org/doc/libs/1_59_0/libs/filesystem/doc/reference.html

We already suffer from this fragmentation because of a lack of JSON library in the stdlib.

tpolecat commented 9 years ago

You say fragmentation, I say marketplace of ideas. :-)

lihaoyi commented 9 years ago

Not sure why you are pessimistic about getting this "right"

The main reason I'm pessimistic is that we've gotten it wrong before. Many times! That resulted in pretty awkward, senseless code making it into the standard lib and being frozen there for eternity: scala.io, scala.xml, scala.parsers, scala.collections.views, scala.collections.parallel, ...

If we encourage people to use third party libraries, we can then pick the winner to include with full confidence we're not leaving half-broken rubbish around for future generations.

I mean, I'm super happy people are trying stuff like:

scala.io.immutable.File - brings in immutable monadic reactive file library. Can be a placeholder for the future.

But I don't see why we should run experiments in the standard lib when previous such experiments (XML, parser-combinators, parallel collections, views, current scala.io, ...), run with the best of intentions, are in the process of being painfully excised from it.

For example, Things like

Is the library centered around Files or Paths? IMO, Paths is the more correct abstraction but also an academic distinction. Most application programmers think about files and do operations on files and for them, files happen to have paths and not the other way (paths happen to have files). I would personally have an immutable set of APIs centered around immutable Paths and a callback-based API based around files (ala node.js).

Indicate we have no idea what we're doing as of this time. "Let's put it in the standard library!" is not the right response to this kind of situation =D

We should be pretty damn sure what we want, and why we want it, before we saddle future generations with our bright ideas! We have a perfectly functional dependency resolution system, as well as a perfectly functional IO library in java.nio. Both are possible alternatives to bundling things in the standard library.

If we can't get some large number of Scala users/projects using our third party library, who's to say our code is good enough to force it upon everybody?

lihaoyi commented 9 years ago

w.r.t. @Ichoran's description of "core" functionality, java.nio is perfectly usable to provide that. e.g. to write to a file in a single line:

Files.write(Paths.get("file.txt"), "file contents".getBytes)

To read from a file in a single line

new String(Files.readAllBytes(Paths.get("test.txt")))

This works great. In fact, it's barely any more verbose than using io.Source to read from a file!

io.Source.fromFile("test.txt").mkString

Anything we include in the standard library would need to be sufficiently better than java.nio to be worth it's weight in the standard library.

pathikrit commented 9 years ago

@tpolecat: I want Scala to be "batteries included". I don't want to spend my time evaluating which library to use (or copying code from StackOverflow) to do simple stuff like delete a directory on my filesystem or parse a json or download a webpage etc. I don't want to spend time making two different libraries I depend on talk to each other just because they use different JSON converters or File classes.

But, as @lihaoyi mentioned, the std lib ends up being the code graveyard frozen in time. Can we have a compromise? Maybe make scala-io an incubator/experimental project that is decoupled from the regular Scala release schedule so it can evolve much faster?

A canonical Scala I/O library on GitHub (that is officially blessed/promoted/recommended by typesafe/scala/@odersky) and manages to attract the best minds in Scala would be an excellent start!

He-Pin commented 9 years ago

@pathikrit decoupled is what exactly what @lihaoyi suggested first and I vote for too.@ktoso is going to add some files support for akka too,then what's your idea about this @ktoso ?

retronym commented 9 years ago

Adding my 2c: I'd be interested to see how far we could get with a java.nio.files._. wrapper that only adds extension or static helper methods, and avoids the temptation to add a layer of data types.

pathikrit commented 9 years ago

@retronym: pretty far IMHO

retronym commented 9 years ago

@pathikrit I'd argue then that you should rename better.files.File to FileOps and make it an implicit value class. Otherwise people will be tempted to use it in their APIs.

pathikrit commented 9 years ago

@retronym: This may not be the right place to discuss it but I removed the implicit conversion, so you would have to explicitly do .toScala to access the Scala one.

This started out as a personal project and for me File is always better.files.File and whenever I import any Java crap, I do import java.io.{File => JFile} to warn the reader of the code. But, I guess, since I released it into the wild, I should give it a different name...

tpolecat commented 9 years ago

Thanks @lihaoyi for writing the novel above. Agree 100%.

@pathikrit it would great to assemble a team and start looking at writing an awesome IO library, but I don't see why this should be done under a SLIP. It's also important to recognize that there are now two largely disjoint Scala canons, and I think you will find substantial and likely intractable disagreement among the "great minds" on how such a library should work.

pathikrit commented 9 years ago

@tpolecat : If its not done under official blessing of the SLIPs (i.e. typesafe/scala/@odersky like entity), it may not necessarily get the attention/mindshare/buy-in it deserves (which is fine for most libraries but may not be for critical ones like an I/O library which every Scala library/company reinvents internally). I am not an expert in such community processes, I will let @dickwall chime in.

Either way, would be happy to contribute once we get something going.

now two largely disjoint Scala canons

Haha, one, can code under scala.io.mutable._ and other under scala.io.immutable._ =)

retronym commented 9 years ago

In case others are interested, @pathikrit and I continued the discussion of the pros and cons of only using extension methods vs providing a parallel hierarchy of data types over here: https://github.com/pathikrit/better-files/commit/346b9825f953c13f7d277c82008e9264183fc6d7#commitcomment-13302644

retronym commented 9 years ago

Anyway, let me help out @dickwall a little here by repeating his gentle instructions, before we all get too deep into the nitty gritty of API design.

First steps: Please organize the first expert group meeting

lihaoyi commented 9 years ago

before we all get too deep into the nitty gritty of API design.

First steps: Please organize the first expert group meeting

There are two parallel conversations here; one is @retronym and @pathikrit and others talking about the intricacies of possible filesystem APIs, and the other is me and @tpolecat and @hepin1989 saying "don't do it, the approach outlined is flawed and will be of zero or negative utility".

IMHO the latter discussion is of critical importance of whether we should be starting an "expert group" to work on this at all.

We've been assigned to a expert group...

By consensus (!) of unknown powers-that-be (Who??? I wasn't consulted!)
With no authority (?) or resources (?),
No mandate or consensus (I certainly hadn't heard about this),
With no actual users of the proposed library on the list of experts, just a bunch of people with abandoned/unknown attempts at IO libraries (myself included)
To work on a pre-specified project that I believe shouldn't happen using an approach I'm 100% sure is doomed to fail...

I guess I'm just not quite ready to roll up my sleeves, start mobilizing an expert group and cranking out improvements to scala.io just yet.

This may sound negative, but is in fact very positive. I am hopeful that we can make IO in the Scala ecosystem better, and have put lots of effort towards that goal. I just don't think the approach demonstrated will accomplish that goal.

I've described some alternative approaches to this problem, that will do without the currently-selected expert-group, and I think show more promise. Hopefully someone is interested =/ It's so much easier to keep talking about Scala and programming but the problems I see here have nothing to do with either.

pathikrit commented 9 years ago

@lihaoyi : What I am proposing is this:

Step 1: Start a scala/scala-io incubator experimental repo

Step 2: Seed it with some basic common util code that wraps over NIO (just start with read/write/cp/mv/delete/list/touch) so it is usable. Less than 50 lines of code but useful enough for me to add it as a dependency.

Step 3: With the blessings of the Scala team, advertise it as future of I/O in Scala or atleast recommended file I/O library. Promote it at confs/forums. Invite developers to contribute and send PRs. Release often and see where this goes...

Step 4: Get buy-in from a major library/team e.g. Play. Send a PR to replace its in-house IOUtil with this so we have a real testbed.

Maybe I am not as pessimistic here. I have seen the power of a really useful library grow from a small seed project with solicitations for contributions e.g. cats, shapeless, spire, scodec etc to become de-facto standard libraries in their domain.

lihaoyi commented 9 years ago

@lihaoyi : What I am proposing is this:

To begin with I think this sounds reasonable; it's totally different from what @dickwall described, but it just might work =D maybe...

Step 1: Start a scala/scala-io incubator experimental repo

That's already been done https://github.com/scala-incubator/scala-io. It died.

With the blessings of the Scala team, advertise it as future of I/O in Scala or atleast recommended file I/O library.

If it doesn't work, why advertise it? https://github.com/scala-incubator/scala-io was certainly advertised as the future of Scala's IO story, and look what happened: now we have lots of confused people who aren't sure if this project, blessed with the Scala name, is alive or dead.

Promote it at confs/forums. Invite developers to contribute and send PRs. Release often and see where this goes...

I've done that with Ammonite-Ops. Result: ~1000 downloads a month on Maven Central. Not bad, could be better, definitely not "worthy of standard library" level of ubiquity. I'm still pushing. Would you like to help? =P

Get buy-in from a major library/team e.g. Play. Send a PR to replace its in-house IOUtil with this so we have a real testbed.

This is 100% necessary, and in fact I think trumps all other concerns about APIs and abstractions and code. If we do not have a real customer, preferably three of them, this adventure is doomed before we start. It's easy to make huge progress writing beautifully elegant code if you don't need to deal with real customers and their pesky little problems =D

On the other hand, this is a lot of hard work. Possibly more than is reasonable to expect from an open-source contribution. Another alternative is to build a library open-source, and hope sufficient people pick it up on their own and become "customers" for us to trust in its quality. That will take longer

Maybe I am not as pessimistic here. I have seen the power of a really useful library grow from a small seed project with solicitations for contributions e.g. cats, shapeless, spire, scodec etc to become de-facto standard libraries in their domain.

None of those projects have "blessing" from Typesafe/Scala. In fact, I can't think of any such projects which have grown into heavy community-driven affairs which have backing from Typesafe/Scala. The Typesafe/Scala projects tend to be worked on full time by Typesafe/Scala people. Perhaps Scala.js if you count the entire ecosystem and not just the compiler.

Don't forget survivorship bias; of course you only hear about the ones which have grown successfully. There are many more which faded into obscurity, including our simulacrum https://github.com/scala-incubator/scala-io which walked the exact same path you're proposing, down to the letter, now dead. No matter how optimistic you are, the fact "someone tried the exact same thing before, it failed" is a reasonable reason to be cautious...

dwijnand commented 9 years ago

One benefit of having it in the standard library is to avoid binary incompatbility / jar hell, by virtue of (1) the community policy of embedding the Scala version in the jar name, (2) sbt's support for this and (3) the Scala team ensuring that binary incompatible changes don't ship (MiMa).

Sometimes you have very deep dependency trees, with libraries depending on libraries depending on libraries, and all works well. Then a few months down the line a few parts up and down the tree have updated and suddenly you have binary incompatibility for different versions of Akka or Scalaz or even a Java Async Http Client. And then the only way to deal with it is pin to a less up to date version of some library, and then do a stop the world update across the corpus.. At least with things in the standard library you avoid this problem.

On Fri, 18 Sep 2015 at 08:02 Li Haoyi notifications@github.com wrote:

@lihaoyi https://github.com/lihaoyi : What I am proposing is this:

To begin with I think this sounds reasonable; it's totally different from what @dickwall https://github.com/dickwall described, but it just might work =D maybe...

Step 1: Start a scala/scala-io incubator experimental repo

That's already been done https://github.com/scala-incubator/scala-io. It died.

With the blessings of the Scala team, advertise it as future of I/O in Scala or atleast recommended file I/O library.

If it doesn't work, why advertise it? https://github.com/scala-incubator/scala-io was certainly advertised as the future of Scala's IO story, and look what happened: now we have lots of confused people who aren't sure if this project, blessed with the Scala name, is alive or dead.

Promote it at confs/forums. Invite developers to contribute and send PRs. Release often and see where this goes...

I've done that with Ammonite-Ops. Result: ~1000 downloads a month on Maven Central. Not bad, could be better, definitely not "worthy of standard library" level of ubiquity. I'm still pushing. Would you like to help? =P

Get buy-in from a major library/team e.g. Play. Send a PR to replace its in-house IOUtil with this so we have a real testbed.

This is 100% necessary, and in fact I think trumps all other concerns about APIs and abstractions and code. If we do not have a real customer, preferably three of them, this adventure is doomed before we start. It's easy to make huge progress writing beautifully elegant code if you don't need to deal with real customers and their pesky little problems =D

On the other hand, this is a lot of hard work. Possibly more than is reasonable to expect from an open-source contribution. Another alternative is to build a library open-source, and hope sufficient people pick it up on their own and become "customers" for us to trust in its quality. That will take longer

Maybe I am not as pessimistic here. I have seen the power of a really useful library grow from a small seed project with solicitations for contributions e.g. cats, shapeless, spire, scodec etc to become de-facto standard libraries in their domain.

None of those projects have "blessing" from Typesafe/Scala. In fact, I can't think of any such projects which have grown into heavy community-driven affairs which have backing from Typesafe/Scala. The Typesafe/Scala projects tend to be worked on full time by Typesafe/Scala people. Perhaps Scala.js if you count the entire ecosystem and not just the compiler.

Don't forget survivorship bias; of course you only hear about the ones which have grown successfully. There are many more which faded into obscurity, including our simulacrum https://github.com/scala-incubator/scala-io which walked the exact same path you're proposing, down to the letter, now dead. No matter how optimistic you are, the fact "someone tried the exact same thing before, it failed" is a reasonable reason to be cautious...

— Reply to this email directly or view it on GitHub https://github.com/scala/slip/issues/19#issuecomment-141365267.

hamishdickson commented 9 years ago

Get buy-in from a major library/team e.g. Play. Send a PR to replace its in-house IOUtil with this so we have a real testbed.

This is 100% necessary, and in fact I think trumps all other concerns about APIs and abstractions and code. If we do not have a real customer, preferably three of them, this adventure is doomed before we start. It's easy to make huge progress writing beautifully elegant code if you don't need to deal with real customers and their pesky little problems =D

@lihaoyi you're 100% right here and honestly I think this is key to this whole conversation. I also think this is a lot more work than we probably realise.

Think about it from akka's/play's/et al's point of view - why should they adopt another new IO library when they have something that works at the moment and does exactly what they need it to? The only reasons I can think they would adopt it are 1) it's EASY to adopt (ie someone raises a PR with all the changes in there - one of us in reality) 2) it's the sensible thing to do in that it has wide community support.

I think it's great that we're talking about this as a community - that's very healthy for scala - but given IO is something almost every large project has to deal with, it would be nice to know what someone from Typesafe/EPFL thinks about this

gpampara commented 9 years ago

I have to agree 100% with @tpolecat and @lihaoyi. The standard library is mostly terrible almost all the time, and having something in common, just for the sake of having something in common, really doesn't seem like its worth the effort.

omidb commented 9 years ago

@lihaoyi , I just want to emphasize that people that start working with language use the features that are already in it, so if we will have a "good" scala-io "plugin", we will have huge number of users that are going to use it. (I don't know if Scalac has something to load those artifact or not, using SBT is not the best idea for these) First time that I wanted write on disk in Scala, I found this: http://stackoverflow.com/questions/6879427/scala-write-string-to-file-in-one-statement Basically there is no way to write on disk with current "scala.io" without using Java.io or third party libs.

I also agree with you about the things that are in Scala and some of them are basically useless because of bad design/implementation; one example that I found was: https://github.com/scala/pickling which I found by googling scala pickle and it crashes for some big files and ... (maybe it's working now, I dunno) But it took me good amount of time to find uPickle/booPickle which are what I'm using now.

ktoso commented 9 years ago

Hi all, since I've been called out a quick response from our (akka) end:

@ktoso is going to add some files support for akka too,then what's your idea about this @ktoso ?

For us it's basically only support for AsynchronousFileChannel as backing implementation of a File Source[ByteString, _] for Akka streams, nothing that relates to standard library I think. I don't think we depend much on the scala IO things actually, so not much we can help out here from Akka's perspective I think.

Very happy that you seem to be getting together to improve the stdlib though!

pathikrit commented 9 years ago

If it doesn't work, why advertise it? https://github.com/scala-incubator/scala-io was certainly advertised as the future of Scala's IO story, and look what happened: now we have lots of confused people who aren't sure if this project, blessed with the Scala name, is alive or dead.

As far I understand, the original scala-io died during 2012 when Java 7 was released with NIO. As @lihaoyi said, Java 7 NIO is actually pretty good. You can write to a file in 1 line:

import java.nio.files.{Files, Paths}
Files.write(Paths.get("file.txt"), "file contents".getBytes)

Although the above line doesn't look like Scala code, but itsn't that bad either and it works like a charm! But, after you write the above line enough times in your code, you end up writing a little util like this:

implicit class PathOps(path: Path) {
  def write(bytes: Array[Byte]): Path = Files.write(path, bytes)
  def write(text: String)(implicit codec: Codec): Path = write(text.getBytes(codec))
}

Now, you can write code that looks like normal Scala code:

file.write("hello world")

And, you have to do this over and over for every file operation you can think of (read, copy, move, touch, list, recurse etc). And, that's how I ended up with better-files - a thin wrapper over Java NIO Paths/Files APIs.

As much as there is a concern about this project is doomed to fail from the beginning, a tiny util library that simply wraps over Java NIO as @retronym suggested and as demonstrated in better-files has potential to "not fail" - simply because it doesn't do much. It doesn't strive to introduce its own hierarchy or its own Source/Sink abstractions etc. It simply is a more idiomatic and pragmatic way to do NIO from Scala. The entire source of better-files is ~100 lines without empty lines and all it does is 1-liner hand-offs to java.nio.files.Files

I agree with @lihaoyi that if we try our own massive beautiful ivory tower I/O project, we would fail like the old scala-io project.

But, I have much higher hopes for an extremely tiny wrapper library being successful.

lihaoyi commented 9 years ago

One benefit of having it in the standard library is to avoid binary incompatbility / jar hell

That is true. It's also not binary: how much jar hell people get depends on how often you update the library. You could easily have a library which is as binary compatible as the standard library while still living outside it: just don't release that often! You can even have a library more compatible than the standard library by writing it in Java, or almost-but-slightly-less-compatible by releasing a bit more often. There's a whole spectrum of compatibility levels and the standard library is just one point on it.

Basically there is no way to write on disk with current "scala.io" without using Java.io

Yeah, but what's wrong with using java.nio? I find it works great. As it stands it's not even much more verbose than using scala.io. One option is we could just tell people to use that. It's pretty good, honestly, everyone already knows it, and we get for free the great wealth of documentation and knowledge available on the internet w.r.t. how to use it.

one example that I found was: https://github.com/scala/pickling which I found by googling scala pickle and it crashes for some big files and ... (maybe it's working now, I dunno)

One of my colleagues thought all of Scala was terrible because of https://github.com/scala/pickling/issues/342, and I had to come him bail out and compile-bisect his code before he dismissed the entire language as taking 10s to compile hello world. Having bad code under Typesafe/Scala name definitely had negative value =D

But, I have much higher hopes for an extremely tiny wrapper library being successful.

Plausible! But not guaranteed.

scala.sys.process is basically an extremely tiny wrapper library around java.lang.Process and yet is a nightmare to use.

it would be nice to know what someone from Typesafe/EPFL thinks about this

I'd love to know too =P "The committee has decreed that someone should do something, community organize thyself" isn't the level of engagement I'd have hoped. If nobody at Typesafe/Scala cares enough to engage in actual discussion, this project is already doomed.

eed3si9n commented 9 years ago

everyone seemed in favor of an overhaul of the scala.io library.

It might be a good idea to state the goal of this project. Is it because the standard library should be batteries-included? Is it so we don't have to keep reinventing IO libraries?

From sbt's point of view, long-term binary compatibility is one of the must-haves, so my hope is that the expert group tries some iterations of APIs, releases 1.0, and immediately lose interest in making feature enhancements, but fix issues as it comes up. Standard library might be a good place to keep such code. I am also ok with some third-party library with long-term commitment. (See also json4s-ast)

It might also be a good idea to make DSL (like enriching File to add /) portion optional. See https://github.com/sbt/io/pull/3.

SethTisue commented 9 years ago

if nobody at Typesafe/Scala cares enough to engage in actual discussion, this project is already doomed

Patience — this ticket was opened one whole day ago!

Jason Zaugg is already weighing in, and I expect others will as well, including myself. I'm gathering my thoughts.

Also, this ticket exists in part as an outcome of discussions among the SIP/SLIP committee (the meetings are available as YouTube videos, and @dickwall has been summarizing them in blog posts). The committee includes Martin and other Typesafe/Scala folks including myself, and this I/O stuff will very much continue to be a topic of discussion at future meetings. This isn't a "turn the community loose and walk away" type thing.

lihaoyi commented 9 years ago

this I/O stuff will very much continue to be a topic of discussion at future meetings. This isn't a "turn the community loose and walk away" type thing.

Good to know =) I'll be patient sorry!

pathikrit commented 9 years ago

My vote: Make a tiny thin scala.nio.Path wrapper around java.nio.files.Path. Simple 1-liner hand-offs to java.nio.files.Files utils. No type-hierarchy, no lofty aspirations, few 100 lines of code. This can serve as a viable prototype.

But, I am very glad that the Scala team is looking at this.

julien-truffaut commented 9 years ago

:+1: to remove io from standard library but as @tpolecat mentioned, I also want a minimal standard library

lrytz commented 9 years ago

I believe the standard library has to ship with some core batteries. Arguments for that have been made, I don't need to repeat them.

We should not forget that the demands on quality and stability of the standard library have grown with Scala itself.

An obvious example: in Scala 2.10 we still have the (deprecated) package scala.util.automata [1]. It was added in 2004 (!!) and basically only modified once in 2009.

We all agree that these were different times for Scala. The only people using Scala were those writing its compiler and standard library, plus their students. Teaching was about automata, so they were added to the library. Easy.

scala.sys.process was started in 2011. Automata would certainly not have been added to standard library anymore at this point: it was Scala 2.8, we had a significant user base. Unlike automata, a library for handling system processes is a widely useful tool. But there was also not a great deal of discussion about it. Paul ~~designed and committed the library~~moved the library, originally written by Mark for sbt, into the standard library and added various improvements. A few others commented or contributed. It was not such a big deal to get it in.

Different times as well. A story like this could be told about other pieces of the standard library: Either, BigInt / BigDecimal, Regex.

A different example is scala.concurrent (Futures). It comes with a SIP, it was worked out in a committee. There were existing libraries out there that served as basis / reference. The library came with Scala 2.10, in 2013.

That is much more times like these. What happened in the meantime? Scala got more wide-spread, it started ensuring binary compatibility, deprecation cycles. We started to appreciate the cost of having something in the library. Also, with the growing community, EPFL and Typesafe there was no longer a single office floor where everything happens.

In the past years almost nothing has happened in the standard library. Getting a SIP through is a huge amount of work and requires someone to continue to push it. However, I think we might be entering a new era now: we haven't seen such an active community before. For someone who wants to push a SIP forward, there has never been so much feedback, and also so much other projects in the same space to look at.

As I said in the beginning, I think we should continue to evolve the standard library.

[1] Enjoy this Scaladoc of BaseBerrySethi: "turns a regular expression over A into a NondetWordAutom over A using the celebrated position automata construction (also called Berry-Sethi or Glushkov)".

densh commented 9 years ago

Another perspective on the issue: ever since the release of Scala.js, the language has outgrown JVM being the only platform to run Scala code on. In the future the number of platforms may only increase making strong standard library a necessity for portable cross-target development.

In the current state of things every single implementation of Scala needs to reimplement substantial part of Java standard library to start running trivial applications. Needless to say such strategy has some serious legal risks as Oracle does sue unlicensed implementations of their APIs (see Oracle vs Google case for example.)

paulp commented 9 years ago

Paul designed and committed the library

Whoa whoa whoa, @lrytz, @paulp certainly did not design the library. It came from sbt - to my knowledge it was written by mark harrah. Like it says in https://github.com/scala/scala/commit/5bada810b4

Imported sbt.Process into trunk, in the guise of package scala.sys.process. It is largely indistinguishable from the version in sbt, at least from the outside.

dwijnand commented 9 years ago

In 2009: https://github.com/sbt/sbt-zero-seven/commit/7d629d150183b0b4589784b92ba79322f6944e30#diff-94c89fc53e3e488ab80985cde8752941

On Fri, 18 Sep 2015 at 23:32 Paul Phillips notifications@github.com wrote:

Paul designed and committed the library

Whoa whoa whoa, @lrytz https://github.com/lrytz, @paulp https://github.com/paulp certainly did not design the library. It came from sbt - to my knowledge it was written by mark harrah. Like it says in scala/scala@5bada81 https://github.com/scala/scala/commit/5bada810b4

Imported sbt.Process into trunk, in the guise of package scala.sys.process. It is largely indistinguishable from the version in sbt, at least from the outside.

— Reply to this email directly or view it on GitHub https://github.com/scala/slip/issues/19#issuecomment-141583995.

wedens commented 9 years ago

I think std lib should provide low-level primitives required to build something higher level for one's liking. Scala inherits this primitives from java lib. So, I agree that scala.io should be removed.

lrytz commented 9 years ago

@paulp certainly did not design the library. It came from sbt. Like it says in scala/scala@5bada81

Sorry about that, I'm lacking in archeology skills :)

non commented 9 years ago

@lihaoyi I think @SethTisue covered this, but I proposed that we reach out to you about this because I knew you had worked on this problem in the context of Ammonite. I had agreed to participate on a SIP/SLIP committee to evaluate these kinds of proposals, and I wanted to be sure that people who had thought about this a lot in the community had the opportunity to weigh-in here. For what it's worth, the conversation on this issue is exactly what I had hoped would happen.

paulp commented 9 years ago

As long as you guys have dragged me into this thread, let me point to some code I did write: psp-std/pio, which includes instance methods for Path translated into static methods of java.nio.file.File.

I also analyzed the java.util.{ function, stream } types for the correct variance so that one could have scala aliases for those which carried the right variance if SI-8079 were ever to be fixed.

Not IO, but see also.

adriaanm commented 9 years ago

Regarding fitting Java's use-site variance approach into ours: http://yanniss.github.io/varj-ecoop12.pdf has inspired me to think about inferring definition-site variance for Java interfaces (which is possible, conservatively, looking only at signatures, if I understood the paper correctly). This is motivated by working with Java 8's Stream API, which has wildcards everywhere to encode function subtyping.

scala / slip

Scala IO fix-up/overhaul #19