Open Kubuxu opened 8 years ago
What version of ipfs?
On Wed, Jan 6, 2016, 11:15 Jakub Sztandera notifications@github.com wrote:
and uses a lot of CPU resources. This 600KiB/s was lucky, most of the time it was around 200 to 300KiB/s.
In comparison ipfs cat reached 250MiB/s.
— Reply to this email directly or view it on GitHub https://github.com/ipfs/go-ipfs/issues/2166.
0.3.11-dev, I will check 0.4 if anything changed in that matter.
It is faster but still 60 times slower than ipfs cat
in case of file containing zeros:
# cat /ipfs/QmdiETTY5fiwTkJeERbWAbPKtzcyjzMEJTJJosrqo2qKNm | pv -a | wc -c
[4.29MiB/s]
^C
# ~/go/bin/ipfs cat /ipfs/QmdiETTY5fiwTkJeERbWAbPKtzcyjzMEJTJJosrqo2qKNm | pv -a | wc -c
[ 247MiB/s]
1073741824
I encountered the issue today. It seems file contents are calculated in each read action. Cat is faster because there is no need to recalculate the file content. Add a read cache will increase the read speed . I achieved 50MB/S by adding a 40MB cache.
What do you mean by read cache?
Rewrite the read function in fuse. When read function was called, check file cache first. If target content was in file cache, just return the content from cache. If content was not in cache, start a new thread to read content from ipfs(this is the key point, read more data than read function needed. Read data is fast, but find the data to read is slow.). And write the data to file cache. Main thread will check file cache constantly until target data was found in cache. Here's part of my code written in python. I use ipfs web api to read data from ipfs.
self.fileCache = {}
self.fileCacheLock = threading.Lock()
# the fuse read function
def read(self, path, length, offset, fh):
end = offset + length
data = self.get_cache(fileHash, offset, end)
return data
def get_cache(self, hash, start, end):
if self.fileCache.has_key(hash):
self.fileCacheLock.acquire()
cache = self.fileCache[hash]
self.fileCacheLock.release()
cache["lock"].acquire()
if start >= cache["start"] and end <= cache["end"]:
data = cache["data"][(start-cache["start"]):(end - cache["start"])]
cache["lock"].release()
return data
else:
cache["lock"].release()
if cache["download"] != None:
# stop download thread
cache["download"].stop()
if cache["download"] != None:
cache["download"].join()
# start new download thread
downloadThread(cache, start).start()
else:
cache = {
"start": start,
"end": start,
"data": "",
"lock": threading.Lock(),
"download": None,
"hash": hash,
}
self.fileCacheLock.acquire()
self.fileCache[hash] = cache
self.fileCacheLock.release()
downloadThread(cache, start).start()
# wait for data
while True:
time.sleep(0.001)
cache["lock"].acquire()
if start >= cache["start"] and end <= cache["end"]:
data = cache["data"][(start-cache["start"]):(end - cache["start"])]
cache["lock"].release()
return data
cache["lock"].release()
class downloadThread(threading.Thread):
def __init__(self, cache, start):
super(downloadThread, self).__init__()
self._stop = threading.Event()
self.cache = cache
self.startIndex = start
def stop(self):
self._stop.set()
def run(self):
# add thread record
self.cache["lock"].acquire()
print "download thread start " + self.cache["hash"] + " " + str(self.startIndex)
if self.cache["download"] != None:
print "Error download thread error"
self.cache["download"] = self
self.cache["lock"].release()
chunkIndex = self.startIndex
r = requests.get("http://127.0.0.1:8080/ipfs/" + self.cache["hash"],
headers={"range": "bytes="+ str(self.startIndex) +"-"}, stream=True, timeout=200)
for chunk in r.iter_content(chunk_size=1024*1024*2): # this value affect performance greatly
if chunk: # filter out keep-alive new chunks
self.cache["lock"].acquire()
if chunkIndex == self.startIndex:
self.cache["data"] = chunk
self.cache["start"] = self.startIndex
else:
self.cache["data"] += chunk
chunkIndex += len(chunk)
self.cache["end"] = chunkIndex
self.cache["lock"].release()
if self.stopped():
break
if self.cache["end"] - self.cache["start"] > 40*1024*1024:
# max cache size 40M
print "max cache size"
break
r.close()
self.cache["lock"].acquire()
if self.cache["download"] == self:
# download completed remove thread record
self.cache["download"] = None
print "download thread end " + self.cache["hash"]
self.cache["lock"].release()
def stopped(self):
return self._stop.isSet()
Hi,
I wrote simple mounting utility that works in a way @randoms described (or at least similar). It's witten in python and uses fusepy for mounting.
Repo is at https://github.com/SupraSummus/ipfs-api-mount
There are many things to improve in this utility. I plan to work on it. (I need "fast" IPFS mountpoints for my other project.)
Nice! Actually, I wonder if it's useful to consider moving away from a built-in fuse interface? We'd probably want a faster API (unix domain sockets and a real RPC protocol) first but, from a security standpoint, it would be really nice (much easier to sandbox IPFS). Also, reducing the number of features built into IPFS directly would be kind of nice...
Thoughts @whyrusleeping?
I published another utility ipfs-mount in an attempt to bring these features together in nodejs land. It supports /ipfs
and /mfs
(todo: /ipns
) and has respectable performance with no added caching layer. The http gateway is still the fastest option...
and uses a lot of CPU resources. This 600KiB/s was lucky, most of the time it was around 200 to 300KiB/s.
In comparison
ipfs cat
reached 250MiB/s.