Open ghost opened 8 years ago
For more possibilities for both handles from and to other processes that may have privileged access to archives that the current process lacks see: http://blog.varunajayasiri.com/passing-file-descriptors-between-processes-using-sendmsg-and-recvmsg
Todo: add blindCopyEntry so an open Handle to another archive can be solicited for information.
And where are the archive contents till you write them into a file? In memory?
I'm not sure what is going on in your example, but the approach seems hackish.
No hack at all. The contents are in the filesystem. As long as the handle is maintained by at least one thread the data remains in the filesystem. No memory is involved. It is no different than any other file opened anywhere else with the exception that due to the unlink (remove) there exists no directory reference to the file.
As soon as the Handle is closed, or the thread/process exits, the file contents are freed by the filesystem. No cleanup necessary.
This leaves one free to create an archive on the fly in a blind/anonymous file. The file can be read or written to by any process/thread that has access to the Handle, which includes passing the Handle to other processes on the OS via sockets.
There is nothing new or 'hackish' about this idiom. It has been around for decades.
There are other applications for Handle passing via OS sockets that need not include unlinking the file from the directory structure. A server that can pass restricted archives to an unprivileged process by making the Handle available via OS socket, no copy of data required.
Thank you, I'll look into that.
This is a related and useful technique: http://stackoverflow.com/questions/14514997/reopen-a-file-descriptor-with-another-access/14515466#14515466
Haskell has had support for Handle/fd passing via sockets for many years.
https://hackage.haskell.org/package/network-2.6.3.1/docs/Network-Socket.html#g:10
What follows is a working piece of code that uses createBlindArchive to create an archive from database documents, and then uploads the archive via Yesod. Once the hClose runs, the archive file vanishes from the filesystem. Had exceptions prevented hClose from being reached, as soon as the thread died the archive file and any contents would vanish.
data Document = Document { documentName :: FilePath
, cronos :: UTCTime
}
download :: FilePath -> [(Document, ByteString)] -> Handler TypedContent
download archivePath documents = do
h <- liftIO $ do
h <- openFile archivePath ReadWriteMode
removeFile archivePath
hSetBinaryMode h True
createBlindArchive h $ do
setArchiveComment "This archive was created by Me!"
forM_ documents
(\(doc, payload) -> do
es <- mkEntrySelector =<< parseRelFile (documentName doc)
setModTime (cronos doc) es
addEntry Store payload es
)
hSeek h AbsoluteSeek 0
pure h
respondSource "application/zip" $ handleToBuild h
handleToBuild :: Handle -> Source (HandlerT site IO) (Flush DBB.Builder)
handleToBuild h = sourceHandle h =$= lumps
where
lumps = maybeM (liftIO $ hClose h) (\b -> yield (Chunk $ BB.insertByteString b) *> lumps) =<< await
maybeM :: (Applicative m) => m b -> (a -> m b) -> Maybe a -> m b
maybeM _ action (Just a) = action a
maybeM defaultAction _ Nothing = defaultAction
OK, you can go ahead with PR, but please preserve backward-compatibility in API.
Absolutely! I already have the code and it passes all of the prior tests. On Wed, 2016-08-03 at 23:43 -0700, Mark Karpov wrote:
OK, you can go ahead with PR, but please preserve backward- compatibility in API. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Would you like me to delay the PR until I add a set of tests to the test suite or just get the working code to you first? On Wed, 2016-08-03 at 23:43 -0700, Mark Karpov wrote:
OK, you can go ahead with PR, but please preserve backward- compatibility in API.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c5 5493e4bb","name":"GitHub"},"entity":{"external_key":"github/mrkkrp/zi p","title":"mrkkrp/zip","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/mo dules/aws/aws- bg.jpg","avatar_image_url":"https://cloud.githubusercontent.com/asset s/143418/15842166/7c72db34-2c0b-11e6-9aed- b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/mrkkrp/zip"}},"updates":{"snippets" :[{"icon":"PERSON","message":"@mrkkrp in #20: OK, you can go ahead with PR, but please preserve backward-compatibility in API."}],"action":{"name":"View Issue","url":"https://github.com/mrkkrp/zip/issues/20#issuecomment- 237466433"}}}
@robertLeeGDM, Let's first see what you've got.
I thought this approach was about equal to the direct conduit approach of zip-stream, but I am realizing that this blind handle might solve the problem of simply computing the content length for populating an http header before streaming the zip. ( sz <- liftIO $ (IO.hSeek h IO.SeekFromEnd 0 >> IO.hTell h)
before seeking 0 )
I have the code for blind handles, and I have used it in commercial production for some while without a problem. I had submitted it as a pull request, but the code was not formatted in accord with the standards used in that package. I did say I was going to fix it, but I'm a bit clueless with github, and so if I did reformat it I'd probably bungle the pull request. On Thu, 2018-01-25 at 02:04 +0000, Kanishka wrote:
I thought this approach was about equal to the direct conduit approach of zip-stream, but I am realizing that this blind handle might solve the problem of simply computing the content length for populating an http header before streaming the zip.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c5 5493e4bb","name":"GitHub"},"entity":{"external_key":"github/mrkkrp/zi p","title":"mrkkrp/zip","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/ass ets/143418/17495839/a5054eac-5d88-11e6-95fc- 7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent .com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed- b52498112777.png","action":{"name":"Open in GitHub","url":"https://gi thub.com/mrkkrp/zip"}},"updates":{"snippets":[{"icon":"PERSON","messa ge":"@kanishka-azimi in #20: I thought this approach was about equal to the direct conduit approach of zip-stream, but I am realizing that this blind handle might solve the problem of simply computing the content length for populating an http header before streaming the zip."}],"action":{"name":"View Issue","url":"https://github.com/mrkkr p/zip/issues/20#issuecomment-360337340"}}}
https://github.com/robertLeeGDM/zip <<< See the 'blind' branch.
On Thu, 2018-01-25 at 02:04 +0000, Kanishka wrote:
I thought this approach was about equal to the direct conduit approach of zip-stream, but I am realizing that this blind handle might solve the problem of simply computing the content length for populating an http header before streaming the zip.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c5 5493e4bb","name":"GitHub"},"entity":{"external_key":"github/mrkkrp/zi p","title":"mrkkrp/zip","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/ass ets/143418/17495839/a5054eac-5d88-11e6-95fc- 7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent .com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed- b52498112777.png","action":{"name":"Open in GitHub","url":"https://gi thub.com/mrkkrp/zip"}},"updates":{"snippets":[{"icon":"PERSON","messa ge":"@kanishka-azimi in #20: I thought this approach was about equal to the direct conduit approach of zip-stream, but I am realizing that this blind handle might solve the problem of simply computing the content length for populating an http header before streaming the zip."}],"action":{"name":"View Issue","url":"https://github.com/mrkkr p/zip/issues/20#issuecomment-360337340"}}}
Memory usage is great in tests. I would emphasize to future users that they need to ensure the filesystem where the handle is created has to have enough space for the largest possible zip file the users expect to produce.
In the long run, an approach that doesn't use a filesystem, even blind, is probably more compatible with serving streaming zips from a web application. The drawback here is that users have to wait a long time before the download actually starts for larger zip files.
UPDATE:
Update 2: After a few months in production, one of our user's chrome browsers gives up when the initial response takes too long. I started implementing async + browser poll approach. My ideal would be to speed up zip generation and keep everything synchronous, but I am not sure if I am constrained by the speed of writing buffers to disk. I haven't explored chunked transfer encoding yet.
We are stuck with the fact that zip was not created with streaming in mind. Zip is it's own worst enemy when it comes to that.
I wanted to create a Handle independent of the zip module. I believe what I have is working currently. If you want I can create a pull request. If you want to see the code it is in my repo.
I can safely write data to the archive w/o actually exposing it to the filesystem unless I want to. The hPutStr could just as well be to a socket or a conduit to an httpd service, etc.