Closed akazwz closed 2 years ago
It's not possible to configure the CAR chunk size with the js library today.
A PR to make this configurable would be rad!
How about adding maxChunkSize to PutCarOptions and PutOptions , like:
export type PutCarOptions = {
/**
* Human readable name for this upload, for use in file listings.
*/
name?: string
/**
* Callback called after each chunk of data has been uploaded. By default,
* data is split into chunks of around 10MB. It is passed the actual chunk
* size in bytes.
*/
onStoredChunk?: (size: number) => void
/**
* Maximum times to retry a failed upload. Default: 5
*/
maxRetries?: number
+ /**
+ * Maximum chunk-size to upload. Default: 1024 * 1024 * 10
+ */
+ maxChunkSize?: number
/**
* Additional IPLD block decoders. Used to interpret the data in the CAR file
* and split it into multiple chunks. Note these are only required if the CAR
* file was not encoded using the default encoders: `dag-pb`, `dag-cbor` and
* `raw`.
*/
decoders?: BlockDecoder<any, any>[]
}
Then in static async putCar, do some changes like:
static async putCar ({ endpoint, token }, car, {
name,
onStoredChunk,
maxRetries = MAX_PUT_RETRIES,
+ maxChunkSize,
decoders,
} = {}) {
- const targetSize = MAX_CHUNK_SIZE
+ const targetSize = maxChunkSize ?? MAX_CHUNK_SIZE
const url = new URL('car', endpoint)
let headers = Web3Storage.headers(token)
if (name) {
headers = { ...headers, 'X-Name': encodeURIComponent(name) }
}
const roots = await car.getRoots()
if (roots[0] == null) {
throw new Error('missing root CID')
}
if (roots.length > 1) {
throw new Error('too many roots')
}
const carRoot = roots[0].toString()
const splitter = new TreewalkCarSplitter(car, targetSize, { decoders })
/**
* @param {AsyncIterable<Uint8Array>} car
* @returns {Promise<CIDString>}
*/
const onCarChunk = async car => {
const carParts = []
for await (const part of car) {
carParts.push(part)
}
const carFile = new Blob(carParts, { type: 'application/car' })
const res = await pRetry(
async () => {
const request = await fetch(url.toString(), {
method: 'POST',
headers,
body: carFile
})
const res = await request.json()
if (!request.ok) {
throw new Error(res.message)
}
if (res.cid !== carRoot) {
throw new Error(`root CID mismatch, expected: ${carRoot}, received: ${res.cid}`)
}
return res.cid
},
{ retries: maxRetries }
)
onStoredChunk && onStoredChunk(carFile.size)
return res
}
const upload = transform(MAX_CONCURRENT_UPLOADS, onCarChunk)
for await (const _ of upload(splitter.cars())) {} // eslint-disable-line
return carRoot
}
And then do some changes in other functions like static async put. Will this work?
@akazwz yes that's a good start! It's a pretty safe change, there shouldn't be any suprises. Do you fancy opening a PR?
worth mentioning in the comment for maxChunkSize
that it's in bytes, and that it's used as the targetSize
as passed to carbites TreewalkSplitter https://github.com/nftstorage/carbites
@akazwz yes that's a good start! It's a pretty safe change, there shouldn't be any suprises. Do you fancy opening a PR?
Sure, I try to do some changes and have already opened a PR, can you have a look when you are free
Default chunk size is about 10MB, So the progress won't change for a long time, is there any way I can change the chunk size so I can let the progress change more frequently. thanks