xtendo-org / rawfilepath

Haskell library for encoding-free interaction with Unix system. Use RawFilePath (ByteString) instead of FilePath (String)
Other
17 stars 3 forks source link
haskell-library unix

rawfilepath

Version: 1.1.1

TL;DR

Overview

The unix package provides RawFilePath which is a type synonym of ByteString. Unlike FilePath (which is String), it has no performance issues because it is ByteString. It has no encoding issues because it is ByteString which is a sequence of bytes instead of characters.

That's all good. With RawFilePath, we can properly separate the "sequence of bytes" and the "sequence of Unicode characters." The control is yours. Properly encode or decode them with UTF-8 or UTF-16 or any codec of your choice.

However,

This library provides the higher-level interface with RawFilePath.

Advantages

rawfilepath is easy to use.

{-# language OverloadedStrings #-}

import RawFilePath
import System.IO
import qualified Data.ByteString as B

main :: IO ()
main = do
  p <- startProcess $ proc "sed" ["-e", "s/\\>/!/g"]
    `setStdin` CreatePipe
    `setStdout` CreatePipe
  B.hPut (processStdin p) "Lorem ipsum dolor sit amet"
  hClose (processStdin p)
  result <- B.hGetContents (processStdout p)
  print result
  -- "Lorem! ipsum! dolor! sit! amet!"

Rationale

Performance

Traditional String is notorious:

This already makes us unhappy enough to avoid String. FilePath is a type synonym of String. Use RawFilePath instead. It's faster and occupies less memory.

Encoding

FilePath is a type synonym of String. This is a bigger problem than what String already has, because it's not just a performance issue anymore; it's a correctness issue as there is no encoding information.

A syscall would give you (or expect from you) a series of bytes, but String is a series of characters. But how do you know the system's encoding? NTFS is UTF-16, and FAT32 uses the OEM character set. On Linux, there is no filesystem-level encoding. Would Haskell somehow magically figure out the system's encoding information and encode/decode accordingly? Well, there is no magic. FilePath has completely no guarantee of correct behavior at all, especially when there are non-ASCII letters.

AFPP

In June 2015, three bright Haskell programmers came up with an elegant solution called the Abstract FilePath Proposal and met an immediate thunderous applause. Inspired by this enthusiasm, they further pursued the career of professional Haskell programming and focused on more interesting things. (sigh)

This library provides a stable and high-performance API that is available now.

Documentation

API documentation of rawfilepath on Stackage.

To do

rawfilepath is stable. We don't expect any backward-incompatible changes. But we do want to port more system functions that are present in process or directory. We'll need to be a bit careful about their API for stability, though.

Patches will be highly appreciated.