ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.16k stars 3.01k forks source link

FUSE mount is very slow #2166

Open Kubuxu opened 8 years ago

Kubuxu commented 8 years ago

and uses a lot of CPU resources. This 600KiB/s was lucky, most of the time it was around 200 to 300KiB/s.

In comparison ipfs cat reached 250MiB/s.

whyrusleeping commented 8 years ago

What version of ipfs?

On Wed, Jan 6, 2016, 11:15 Jakub Sztandera notifications@github.com wrote:

and uses a lot of CPU resources. This 600KiB/s was lucky, most of the time it was around 200 to 300KiB/s.

https://camo.githubusercontent.com/d43002376ae600886ba0d4a68a478ad77a7440e0/68747470733a2f2f697066732e706963732f697066732f516d655969446e4645554a4c72396479585739453443374a65446d45526544516f463253346369534e6d77594642

In comparison ipfs cat reached 250MiB/s.

— Reply to this email directly or view it on GitHub https://github.com/ipfs/go-ipfs/issues/2166.

Kubuxu commented 8 years ago

0.3.11-dev, I will check 0.4 if anything changed in that matter.

Kubuxu commented 8 years ago

It is faster but still 60 times slower than ipfs cat in case of file containing zeros:

# cat /ipfs/QmdiETTY5fiwTkJeERbWAbPKtzcyjzMEJTJJosrqo2qKNm | pv -a | wc -c
[4.29MiB/s]
^C
# ~/go/bin/ipfs cat /ipfs/QmdiETTY5fiwTkJeERbWAbPKtzcyjzMEJTJJosrqo2qKNm | pv -a | wc -c
[ 247MiB/s]
1073741824
randoms commented 8 years ago

I encountered the issue today. It seems file contents are calculated in each read action. Cat is faster because there is no need to recalculate the file content. Add a read cache will increase the read speed . I achieved 50MB/S by adding a 40MB cache.

Kubuxu commented 8 years ago

What do you mean by read cache?

randoms commented 8 years ago

Rewrite the read function in fuse. When read function was called, check file cache first. If target content was in file cache, just return the content from cache. If content was not in cache, start a new thread to read content from ipfs(this is the key point, read more data than read function needed. Read data is fast, but find the data to read is slow.). And write the data to file cache. Main thread will check file cache constantly until target data was found in cache. Here's part of my code written in python. I use ipfs web api to read data from ipfs.

self.fileCache = {}
self.fileCacheLock = threading.Lock()

# the fuse read function
def read(self, path, length, offset, fh):
    end = offset + length
    data = self.get_cache(fileHash, offset, end)
    return data

def get_cache(self, hash, start, end):
    if self.fileCache.has_key(hash):
        self.fileCacheLock.acquire()
        cache = self.fileCache[hash]
        self.fileCacheLock.release()
        cache["lock"].acquire()
        if start >= cache["start"] and end <= cache["end"]:
            data = cache["data"][(start-cache["start"]):(end - cache["start"])]
            cache["lock"].release()
            return data
        else:
            cache["lock"].release()
            if cache["download"] != None:
                # stop download thread
                cache["download"].stop()
                if cache["download"] != None:
                    cache["download"].join()
            # start new download thread
            downloadThread(cache, start).start()
    else:
        cache = {
            "start": start,
            "end": start,
            "data": "",
            "lock": threading.Lock(),
            "download": None,
            "hash": hash,
        }
        self.fileCacheLock.acquire()
        self.fileCache[hash] = cache
        self.fileCacheLock.release()
        downloadThread(cache, start).start()
    # wait for data
    while True:
        time.sleep(0.001)
        cache["lock"].acquire()
        if start >= cache["start"] and end <= cache["end"]:
            data = cache["data"][(start-cache["start"]):(end - cache["start"])]
            cache["lock"].release()
            return data
        cache["lock"].release()

class downloadThread(threading.Thread):

    def __init__(self, cache, start):
        super(downloadThread, self).__init__()
        self._stop = threading.Event()
        self.cache = cache
        self.startIndex = start

    def stop(self):
        self._stop.set()

    def run(self):
        # add thread record
        self.cache["lock"].acquire()
        print "download thread start " + self.cache["hash"] + " " + str(self.startIndex)
        if self.cache["download"] != None:
            print "Error download thread error"
        self.cache["download"] = self
        self.cache["lock"].release()

        chunkIndex = self.startIndex
        r = requests.get("http://127.0.0.1:8080/ipfs/" + self.cache["hash"],
        headers={"range": "bytes="+ str(self.startIndex) +"-"}, stream=True, timeout=200)
        for chunk in r.iter_content(chunk_size=1024*1024*2): # this value affect performance greatly
            if chunk: # filter out keep-alive new chunks
                self.cache["lock"].acquire()
                if chunkIndex == self.startIndex:
                    self.cache["data"] = chunk
                    self.cache["start"] = self.startIndex
                else:
                    self.cache["data"] += chunk
                chunkIndex += len(chunk)
                self.cache["end"] = chunkIndex
                self.cache["lock"].release()
            if self.stopped():
                break
            if self.cache["end"] -  self.cache["start"] > 40*1024*1024:
                # max cache size 40M
                print "max cache size"
                break
        r.close()
        self.cache["lock"].acquire()
        if self.cache["download"] == self:
            # download completed remove thread record
            self.cache["download"] = None
        print "download thread end " + self.cache["hash"]
        self.cache["lock"].release()
    def stopped(self):
        return self._stop.isSet()
SupraSummus commented 7 years ago

Hi,

I wrote simple mounting utility that works in a way @randoms described (or at least similar). It's witten in python and uses fusepy for mounting.

Repo is at https://github.com/SupraSummus/ipfs-api-mount

There are many things to improve in this utility. I plan to work on it. (I need "fast" IPFS mountpoints for my other project.)

Stebalien commented 7 years ago

Nice! Actually, I wonder if it's useful to consider moving away from a built-in fuse interface? We'd probably want a faster API (unix domain sockets and a real RPC protocol) first but, from a security standpoint, it would be really nice (much easier to sandbox IPFS). Also, reducing the number of features built into IPFS directly would be kind of nice...

Thoughts @whyrusleeping?

piedar commented 6 years ago

I published another utility ipfs-mount in an attempt to bring these features together in nodejs land. It supports /ipfs and /mfs (todo: /ipns) and has respectable performance with no added caching layer. The http gateway is still the fastest option...