mwri / elixir-userfs

Elixir FUSE (Filesystem in Userspace) interface
8 stars 3 forks source link

read big file will case fuse disconnected #2

Open leonardzhou opened 2 years ago

leonardzhou commented 2 years ago

hi,

I'm not familar with C, so I cannot figure it out now on how to fix the fuse disconnected problem when I pass a video(over 100+MB) to fuse.

>>>  fuse: bad mount point `/tmp/fs': Transport endpoint is not connected

I found maxdatalen isn't used in read_from_erlang procudure. if this means it may introduce error when erlang pass data greater than BUFFER_SIZE?

thanks anyway, your works introduce a lot of fun of playing on elixir interacting with os.

best regards, leonard zhou

`int read_from_erlang(unsigned char * buf, int maxdatalen) {

uint32_t datalen;

int readlen;
if ((readlen = read(3, (unsigned char *) &datalen, 4)) != 4) {
    syslog(LOG_WARNING, "efuse[%d]: read: port error reading data header (read %d)",
            getpid(), readlen);
    return -1;
}
datalen = ntohl(datalen);

uint32_t magiccookie1;
if ((readlen = read(3, (unsigned char *) &magiccookie1, 4)) != 4) {
    syslog(LOG_CRIT, "efuse[%d]: read: port error reading data header (read %d)",
        getpid(), readlen);
    exit(1);
}
magiccookie1 = ntohl(magiccookie1);
if (magiccookie1 != EFUSE_MAGICCOOKIE) {
    syslog(LOG_CRIT, "efuse[%d]: read: port read invalid magic cookie %u",
            getpid(), magiccookie1);
    exit(1);
}
datalen -= sizeof(uint32_t);

uint32_t readtotal = 0;
while (readtotal < datalen) {
    readlen = read(3, buf+readtotal, datalen-readtotal);
    if (readlen < 0) {
        syslog(LOG_ERR, "efuse[%d]: read: port read %d (expected %d)",
                getpid(), readtotal, datalen);
        return -1;
    }
    readtotal += readlen;
    if (readtotal < datalen)
        syslog(LOG_WARNING, "efuse[%d]: read: port short read (%d of %d)",
                getpid(), readtotal, datalen);
}

return datalen;

} `

mwri commented 2 years ago

Hmm, it's been a long time since I've looked at this code. I've had a quick look and reminded myself how Fuse works, and, I agree there may be a problem as you say; the read_from_erlang function should respect maxdatalen, and the static buffer size is small. This would be quite easy to fix, but, I think when dealing with files of this size there is another, bigger problem.

This library is quite simple in design, I'd say it's proof of concept standard really, when reading a file the C interface (callback fusecb_read) synchronously A) sends the message to Erlang, B) reads the response, and C) returns to Fuse and kernel land.

This makes the implementation of both the C interface and the Erlang port and the Erlang or Elixir filesystem implementation very simple. But dealing with a 100MB file, I'm thinking that Fuse is going to allocate memory for that, the C interface must too, and in Erlang land it must be allocated at least once... so the memory footprint of the transaction is pretty signficant, even one request! Maybe you have plenty of memory for this, but from a design perspective, it's really not very good.

The Fuse interface has a perfectly adequate design to cope with this by requesting data in chunks (if you look at fusecb_read there is an offset param). The C interface supports the offset value, as it should, but it truncates support for it going forwards, the Erlang port does not support it, nor need the Erlang or Elixir filesystem implementation worry about it.

The C interface and Erlang port could be upgraded to support this, and I think the size of the data being requested by Fuse should be decoupled from what the Erlang port can handle... which would mean the C interface would have to be capable of sending the request to the Erlang port and receiving multiple responses from it instead (this would allow it to accomodate Fuse asking for more data than the Erlang port allows).

Ideally it should be at least possible for a filesystem to also support the offset as well, though in principal the filesystem implementation could remain the same (pass back huge lumps of data which the Erlang port and C interface would split or not as required). It's simply not ideal though for anything to have to deal with such large memory allocations.

So in summary, I think it really just needs a bit more work to make it adequate for larger files.