I see all shared datasets are ~350GB, which is much less than the size (~1562G) mentioned in the paper, I am wondering if you have scheduled to share all datasets. Or you only use ~350GB for pre-train and the rest of the data for downstream task finetuning?
Is there a CLI to direct download datasets from "https://rec.ustc.edu.cn/". I'm using AWS cloud service, it will be hard to download 350GB and then upload them to AWS cloud again.
1562G is all the 3D data. After data segmentation and cleaning, 350G of data was obtained for pre-training.
AWS cloud service may not be uploaded directly in our country at this time. Therefore, we are also looking for some feasible methods, so temporarily use the ustc cloud service.
Hi, thanks for the great work, learned a lot!
I see all shared datasets are ~350GB, which is much less than the size (~1562G) mentioned in the paper, I am wondering if you have scheduled to share all datasets. Or you only use ~350GB for pre-train and the rest of the data for downstream task finetuning?
Is there a CLI to direct download datasets from "https://rec.ustc.edu.cn/". I'm using AWS cloud service, it will be hard to download 350GB and then upload them to AWS cloud again.
Thanks!