web3-storage / web3.storage

DEPRECATED ⁂ The simple file storage service for IPFS & Filecoin
https://web3.storage
Other
501 stars 122 forks source link

Example code fails to put a 9GB file #318

Open insanity54 opened 3 years ago

insanity54 commented 3 years ago

I have a video file that I would like to upload to web3.storage using the web3.storage nodejs library. I am using example code found at https://docs.web3.storage/how-tos/store/#preparing-files-for-upload

I'm running the script on a VPS with 2048 MB of RAM. I am unable to successfully run the upload script because the process is getting killed before it completes.

The only output I see in the console is Killed

My best guess is that the script is consuming too much memory.

vasco-santos commented 3 years ago

Hello @insanity54 Are you using the code example for Node.js https://docs.web3.storage/how-tos/store/#node.js ? It seems you are using the browser way (?), where we need to put everything in memory. The Node.js way will read files with ReadableStream and should not get everything in memory. Can you show me the code you are using?

insanity54 commented 3 years ago

Thanks @vasco-santos. Here is the code I'm using

/**
 * web3-upload.mjs
 *
 * Example Usage: node --experimental-modules ./web3-upload.mjs ~/Videos/my-large-video.mp4
 */
import dotenv from 'dotenv'
import minimist from 'minimist'
import { Web3Storage, getFilesFromPath } from 'web3.storage'

dotenv.config();

async function getFiles(path) {
  const files = await getFilesFromPath(path)
  console.log(`read ${files.length} file(s) from ${path}`)
  return files
}

async function upload (opts) {
  const { token, file } = opts;
  if (typeof token === 'undefined') {
    throw new Error('A web3.storage token "token" must be passed in options object, but token was undefined.')
  }
  if (typeof file === 'undefined') {
    throw new Error('file was undefined.')
  }

  const filesObject = await getFiles(file)
  console.log(filesObject)

  await storeWithProgress(filesObject);
}

function getAccessToken() {
  const token = process.env.WEB3_TOKEN;
  if (typeof token === 'undefined') {
    return console.error('A token is needed. (WEB3_TOKEN in env must be defined). You can create one on https://web3.storage. ')
  }
  return token
}

function makeStorageClient() {
  return new Web3Storage({ token: getAccessToken() })
}

async function storeWithProgress(files) {  

  console.log(`uploading files ${files}`)

  // show the root cid as soon as it's ready
  const onRootCidReady = cid => {
    console.log('uploading files with cid:', cid)
  }

  // when each chunk is stored, update the percentage complete and display
  const totalSize = files.map(f => f.size).reduce((a, b) => a + b, 0)
  let uploaded = 0

  const onChunkStored = size => {
    uploaded += size
    const pct = totalSize / uploaded
    console.log(`Uploading... ${pct.toFixed(2)}% complete`)
  }

  // makeStorageClient returns an authorized Web3.Storage client instance
  const client = makeStorageClient()

  // client.put will invoke our callbacks during the upload
  // and return the root cid when the upload completes
  return client.put(files, { onRootCidReady, onChunkStored })
}

function getCliArgs () {
  const args = minimist(process.argv.slice(2))

  if (args._.length < 1) {
    return console.error('Please supply the path to a file or directory')
  }
  return args._;
}

async function main () {

  await upload({
    file: getCliArgs(),
    token: getAccessToken()
  })
}

main()
vasco-santos commented 3 years ago

Thanks for the snippet @insanity54

As I can see, you are using the Node.js util getFilesFromPath, which means the file is packed into a CAR file and chunked with Streamming, which should mean the memory consumption would be fine. We did a change yesterday that can help with this https://github.com/web3-storage/ipfs-car/pull/74 where we added backpressure to guarantee we do not end up with a large memory footprint on slow readers.

Meanwhile, I am going to test today a ~9GB file and observe the memory consumption. I would also suggest you re-try this with the dependencies updated (npm ls ipfs-car should have ipfs-car@0.5.5), and if possible get information of the memory consumption in your VPS while running it.

As far as I understand, it does not even call onChunkStored ? which means the problem can be while transforming the video file to a CAR file and chunking it into small files for sending.

insanity54 commented 3 years ago

Thanks for the update @vasco-santos

I upgraded ipfs-car

└─┬ web3.storage@3.1.0
       └── ipfs-car@0.5.5

The upload got further along, this time I saw the console.log which shows the cid

I tracked memory usage every 5 seconds using pidstat --human -r -T ALL -p ${pid}. The process did not exit gracefully, it was killed once again, with the last report from pidstat as follows.

03:33:44 PM   UID       PID  minflt/s  majflt/s     VSZ     RSS   %MEM  Command
03:33:44 PM     0   2792496      1.46      0.16   11.4G  780.4M  39.3%  node
vasco-santos commented 3 years ago

All right, so memory wise this looks good. So, the CAR was correctly generated, but then something apparently happens before the first chunks are sent as your onChunkStored is not called.

I wonder if the generated CAR file from your large video results in a specific particularity that somehow has a bug, I could not replicate it with several different large files. It can be an issue with https://github.com/nftstorage/carbites where we chunk the CAR file.

Did you try that file on a local machine with success? Perhaps we can have a stacktrace.

insanity54 commented 3 years ago

I tried on a local machine with success, although onChunkStored is still not called.

I would be happy to provide a stacktrace, but I'm not sure how to do that for a node program that isn't crashing.

Would a stacktrace using console.trace() be helpful? If so, I'd need to know where in my code snippet that line would be useful.

Or is this something that can be done with node's debug tool (inspect)?

I did get a stacktrace using strace while the script was running. I don't know if that's what you meant or if it's of any use, but I'll attach it.

strace-log-2020-08-11.txt

vasco-santos commented 3 years ago

I tried on a local machine with success, although onChunkStored is still not called.

The entire 9GB file was packed and it was not called onChunkStored locally?

My best guess so far is that either this is the same issue as https://github.com/web3-storage/ipfs-car/issues/69 (which I can replicate and I am trying to fix), or you will get your disk full, is it possible?

In theory, the kernel would only kill a process in exceptional circumstances, like resource starvation, including memory exhaustion. I am thinking we can get more information about why the process is being killed with something as https://askubuntu.com/a/709366

vasco-santos commented 3 years ago

By the way. the parameter that client.put should receive is onStoredChunk and not onChunkStored per https://github.com/web3-storage/web3.storage/blob/main/packages/client/src/lib/interface.ts#L98

insanity54 commented 3 years ago

@vasco-santos good catch on the onStoredChunk function name! I opened a PR in the docs repo to fix that. https://github.com/web3-storage/docs/pull/159

Yes, I had several failed upload attempts where the disk became full of files in /tmp. I assumed that was normal behavior, perhaps requiring free disk space equal to the size of the file being uploaded, so I made sure to have >9GB of free disk space when I made subsequent upload attempts.

Here's a dmesg log of the VPS when the node process gets Killed dmesg.log

insanity54 commented 3 years ago

I noticed a lot of noise in that dmesg (from docker, especially), so I spun up a new VPS for just this task. Here's the dmesg of the new vps which shows the node process getting killed

dmesg-2.log

vasco-santos commented 3 years ago

Thanks for the logs @insanity54 🙏🏼

 Out of memory: Killed process 10250 (node) total-vm:12982680kB, anon-rss:1818844kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:4888kB oom_score_adj:0

So, this seems related to https://github.com/web3-storage/ipfs-car/issues/69 which I am also debugging and trying to find the root reason. It seems like https://github.com/ipfs/js-ipfs-unixfs/tree/master/packages/ipfs-unixfs-importer is leaking memory. I will keep you posted

Amanse commented 2 years ago

surprising you could get to 9GB, my 4GB ram VM on GCP dies at 1GB or sometimes less

vasco-santos commented 2 years ago

@Amanse can you let me know what version of web3.storage client you are using, and more precisely what ipfs-car version it is using. From ipfs-car@0.5.9 we had a change that can help with the memory consumption

insanity54 commented 2 years ago

I just updated my node packages and tried again. Still seeing the same Killed behavior wth ipfs-car@0.5.10 and web3.storage@3.5.1

vasco-santos commented 2 years ago

@insanity54 I just noticed that you are using getFilesFromPath, can you try using https://github.com/web3-storage/files-from-path#usage instead?

It is also exported from web3.storage as import { Web3Storage, filesFromPath } from 'web3.storage'.

Can you also check you are using it-glob at minimum with 0.0.14?

insanity54 commented 2 years ago

I switched over to filesFromPath

import { Web3Storage, filesFromPath } from 'web3.storage'

// ...

async function getFiles(path) {
  let files = [];
  for await (const f of filesFromPath(path)) {
    console.log(`file:${f.name}`);
    files.push(f);
  }
  return files
}

// ...

const filesObject = await getFiles(file)

I ran yarn yarn-upgrade-all and checked version of it-glob

# yarn list --pattern it-glob
yarn list v1.22.17
├─ ipfs-utils@8.1.6
│  └─ it-glob@0.0.14
└─ it-glob@0.0.13
Done in 1.09s.

I ran my upload script again and it was Killed

Amanse commented 2 years ago

@Amanse can you let me know what version of web3.storage client you are using, and more precisely what ipfs-car version it is using. From ipfs-car@0.5.9 we had a change that can help with the memory consumption

sorry for the late reply, i am using web storage-3.3.3 and ipfs-car-0.5.10

Amanse commented 2 years ago

i changed my ram on vm from 4gb to 8gb and now it can upload from 1.4gb to 2.4gb before it's killed

Amanse commented 2 years ago

I have enough storage on the VM, is there a way i can use that Directly instead of swap?

insanity54 commented 2 years ago

I have enough storage on the VM, is there a way i can use that Directly instead of swap?

I think you're asking if there's a way to increase memory by using disk so the web3 process isn't killed? I think I tried that at one point, but wasn't able to see any difference. I used the following guide. https://www.digitalocean.com/community/tutorials/how-to-configure-virtual-memory-swap-file-on-a-vps

Amanse commented 2 years ago

I have enough storage on the VM, is there a way i can use that Directly instead of swap?

I think you're asking if there's a way to increase memory by using disk so the web3 process isn't killed? I think I tried that at one point, but wasn't able to see any difference. I used the following guide. https://www.digitalocean.com/community/tutorials/how-to-configure-virtual-memory-swap-file-on-a-vps

yeah it didn't have much effect, but that's cause OS doesn't use swap as a priority for RAM, if we increase swapniess the whole OS performance will start lagging, i was asking if we can use storage for web3 without swap ,as in, web3 uses storage from js side instead of pushing everything to ram

i use swap on my personal machine as well, it's not a solution for this problem sadly

gregorym commented 2 years ago

@insanity54 Did you find a work-around? I'm facing the same problem on DigitalOcean.

insanity54 commented 2 years ago

@gregorym My workaround right now is that I spin up a 16GB VPS specifically for web3 upload.

gulprun commented 2 years ago

We can't enlarge our host forever to remove this issue, any concrete solutions?

dchoi27 commented 2 years ago

Heads up folks - we're in the middle of revamping our uploads flow in a super exciting, very IPFS-y way that will be way more usable (CAR generation will be streaming-y so memory constraints way less, among many other benefits). Please stay tuned!

gulprun commented 2 years ago

Love to know if this one is fixed, or any prospect?

Amanse commented 2 years ago

any updates?