Closed ChiShiang closed 1 year ago
Hey @ChiShiang , thanks for reporting this issue! I wasn't able to reproduce it using the latest lakeFS release, can you try that version and see if it was resolved?
More questions:
s3://bucket/prefix/images/
where all images are flat under that location, or anything else.Hi @itaiad200 thanks for replying :)
I wasn't able to reproduce it using the latest lakeFS release, can you try that version and see if it was resolved?
I'll try it and reply the results after the update finished. :)
More questions:
- You tried both with lakectl (of an higher version?) and the UI, right?
- We've seen similar bugs recently with Azure blob store where funky empty dirs have the same name as some files. It should still work but might guide us to the right direction: Can you share more information on the imported location structure? e.g. s3://bucket/prefix/images/ where all images are flat under that location, or anything else.
Example as below:
-----lakefs-imported [bucket]
|-----project-A [prefix]
|-----images
|-----000000.jpg
|-----000001.jpg
|-----000002.jpg
|-----000003.jpg
|-----000004.jpg
...
|-----239331.jpg
@ChiShiang can you please tell us what is the backing object store you are using?
Hey @ChiShiang, we need some more input in order to identify the root cause.
We suspect the compatible object storage in question has some caveats, like the order of the returned objects while listing.
I suggest we try running the following command using aws cli that points to that backing storage. Please run it once with and once without the sort
and compare the outputs.
aws s3 ls s3://lakefs-imported/project-A/images/ | awk '{print $4}' > raw.txt
aws s3 ls s3://lakefs-imported/project-A/images/ | awk '{print $4}' | sort > sorted.txt
Hi @N-o-Z, @itaiad200
can you try that version and see if it was resolved?
Unfortunately, the same issue occurred in the latest lakeFS version :\
can you please tell us what is the backing object store you are using?
I am currently using the object storage provided by TWCC, which is the Taiwan Computation Cloud service offered by the National Center for High Performance Computing in Taiwan. However, I suspect there may be compatibility issues with the object storage I am currently using. As a result, I am trying to run the same scenario using the MINIO backend.
I suggest we try running the following command using aws cli that points to that backing storage. Please run it once with and once without the sort and compare the outputs.
Okay, I'll share the results after trying these commends.
@ChiShiang thank you for sharing the information with us. Please update us if you encounter the same issue with MinIO (should not occur). If you must use the TWCC storage service, we can provide you with a workaround for the import to work
Hi @N-o-Z @itaiad200
Sorry for the late reply. Fortunately, the issue has been resolved after trying the lakeFS with MinIO backend.
Thank you for your kind help with this issue.
Closing issue, please note that this will be resolved in https://github.com/treeverse/lakeFS/pull/5840 also for S3 implementations that do not list objects lexicographically
What happened?
Current Behavior: Hi there,
I am trying to import data from an object storage to a repository in lakeFS. The data in the object storage consists of more than 10,000 images. However, I am encountering a 500 internal server error when trying to import the data.
The error message is as follows:
I have tried renaming all of the files using a numeric ordering or a random string (e.g. SSID), but the error still occurred.
On the other hand, I have noticed that importing less than 999 images works fine. However, when trying to import the entire dataset of over 999 images, the error occurs.
Steps to Reproduce:
lakectl
orweb page
Expected Behavior
Expected Behavior:
All data which stored in the s3 objects (images) will be imported to a specific repo in lakefs
lakeFS Version
0.96.1
Deplyoment
Other Cloud Storage (Support S3 Protocol)
Affected Clients
lakectl 0.99.0
Relevant logs output
Contact Details
hyaline0317@gmail.com