Open rrazavipour opened 3 months ago
@rrazavipour is there something specific to the structure of this data (e.g. very nested, or too many directories, etc). How many files overall? Is it happening only on this FSx Lustre? What instance size are you using on AWS?
Don’t have the numbers but a large number of directories, about 420Gb all together. it has worked on Mac, Windows and our own GPU machine. this is the first time we are working with dvc and AWS FSx Lustre and seeing these problems. EC2 is 2xlarge, 32 Gb of RAM.
Bug Report
dvc pull
Description
dvc pull crashes with sqlite3.OperationError: disk I/O error
Reproduce
this happens trying to pull a 420G of data on an Amazon FSx Lustre filesystem. I complete the git clone I only do a dvc pull, after many hours of operation. I get the mentioned error.
Expected
dvc pull to complete
Environment information
[ec2-user@ip-10-0-1-122 ~]$ dvc doctor DVC version: 3.53.0 (pip)
Platform: Python 3.9.16 on Linux-6.1.97-104.177.amzn2023.x86_64-x86_64-with-glibc2.34 Subprojects: dvc_data = 3.15.1 dvc_objects = 5.1.0 dvc_render = 1.0.2 dvc_task = 0.4.0 scmrepo = 3.3.6 Supports: http (aiohttp = 3.10.0, aiohttp-retry = 2.8.3), https (aiohttp = 3.10.0, aiohttp-retry = 2.8.3), s3 (s3fs = 2024.6.1, boto3 = 1.34.131) Config: Global: /home/ec2-user/.config/dvc System: /etc/xdg/dvc
Output of
dvc doctor
:Additional Information (if any): Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/dvc/cli/init.py", line 211, in main ret = cmd.do_run() File "/usr/local/lib/python3.9/site-packages/dvc/cli/command.py", line 27, in do_run return self.run() File "/usr/local/lib/python3.9/site-packages/dvc/commands/data_sync.py", line 35, in run stats = self.repo.pull( File "/usr/local/lib/python3.9/site-packages/dvc/repo/init.py", line 58, in wrapper return f(repo, *args, *kwargs) File "/usr/local/lib/python3.9/site-packages/dvc/repo/pull.py", line 42, in pull stats = self.checkout( File "/usr/local/lib/python3.9/site-packages/dvc/repo/init.py", line 58, in wrapper return f(repo, args, kwargs) File "/usr/local/lib/python3.9/site-packages/dvc/repo/checkout.py", line 142, in checkout diff = compare(old, new, relink=relink, delete=True, callback=pb.as_callback()) File "/usr/local/lib/python3.9/site-packages/dvc_data/index/checkout.py", line 315, in compare ret = _compare( File "/usr/local/lib/python3.9/site-packages/dvc_data/index/checkout.py", line 243, in _compare for change in idiff( File "/usr/local/lib/python3.9/site-packages/dvc_data/index/diff.py", line 320, in diff yield from changes File "/usr/local/lib/python3.9/site-packages/dvc_data/index/diff.py", line 230, in _diff new_dir_items, new_unknown = _get_items(new, key, new_entry, kwargs) File "/usr/local/lib/python3.9/site-packages/dvc_data/index/diff.py", line 152, in _get_items items = dict(index.ls(key, detail=True)) File "/usr/local/lib/python3.9/site-packages/dvc_data/index/view.py", line 128, in ls self._index._ensure_loaded(root_key) File "/usr/local/lib/python3.9/site-packages/dvc_data/index/index.py", line 759, in _ensure_loaded entry = self.get(prefix) File "/usr/lib64/python3.9/_collections_abc.py", line 763, in get return self[key] File "/usr/local/lib/python3.9/site-packages/dvc_data/index/index.py", line 671, in getitem item = self._trie.get(key) File "/usr/lib64/python3.9/_collections_abc.py", line 763, in get return self[key] File "/usr/local/lib/python3.9/site-packages/sqltrie/serialized.py", line 58, in getitem raw = self._trie[key] File "/usr/local/lib/python3.9/site-packages/sqltrie/sqlite/sqlite.py", line 266, in getitem row = self._get_node(key) File "/usr/local/lib/python3.9/site-packages/sqltrie/sqlite/sqlite.py", line 202, in _get_node rows = list(self._traverse(key)) File "/usr/local/lib/python3.9/site-packages/sqltrie/sqlite/sqlite.py", line 191, in _traverse self._conn.executescript(STEPS_SQL.format(path=path, root=self._root_id)) MemoryError
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/bin/dvc", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/dvc/cli/init.py", line 236, in main
ret = _log_exceptions(exc) or 255
File "/usr/local/lib/python3.9/site-packages/dvc/cli/init.py", line 147, in _log_exceptions
_log_unknown_exceptions()
File "/usr/local/lib/python3.9/site-packages/dvc/cli/init.py", line 49, in _log_unknown_exceptions
logger.debug("Version info for developers:\n%s", get_dvc_info())
File "/usr/local/lib/python3.9/site-packages/dvc/info.py", line 38, in get_dvc_info
with Repo() as repo:
File "/usr/local/lib/python3.9/site-packages/dvc/repo/init.py", line 209, in init
self.state = State(self.root_dir, self.site_cache_dir, self.dvcignore)
File "/usr/local/lib/python3.9/site-packages/dvc_data/hashfile/state.py", line 92, in init
self.links = Cache(links_dir)
File "/usr/local/lib/python3.9/site-packages/dvc_data/hashfile/cache.py", line 59, in init
super().init(directory=directory, timeout=timeout, disk=disk, **settings)
File "/usr/local/lib/python3.9/site-packages/diskcache/core.py", line 478, in init
self.reset(key, value, update=False)
File "/usr/local/lib/python3.9/site-packages/diskcache/core.py", line 2431, in reset
((old_value,),) = sql(
sqlite3.OperationalError: disk I/O error