learningequality / kolibri

Kolibri Learning Platform: the offline app for universal education
https://learningequality.org/kolibri/
MIT License
784 stars 650 forks source link

Attempting to read data that has not been stored yet #11876

Open marcellamaki opened 8 months ago

marcellamaki commented 8 months ago

Spotted by @bjester in 0.16 syncing session 14-Feb-2024

ERROR 2024-02-14 08:08:49,349 kolibri.core.content.utils.content_request Attempting to read data that has not been stored yet
Traceback (most recent call last):
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/core/content/utils/content_request.py", line 832, in _process_download
    _, count = import_manager.run()
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/core/content/utils/resource_import.py", line 629, in run
    return super(ContentDownloadRequestResourceImportManager, self).run()
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/core/content/utils/resource_import.py", line 212, in run
    results = self.run_import()
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/core/content/utils/resource_import.py", line 334, in run_import
    raise self.exception
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/core/content/utils/resource_import.py", line 159, in _handle_future
    future.result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/core/content/utils/resource_import.py", line 130, in _start_file_transfer
    filetransfer.run()
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/utils/file_transfer.py", line 692, in inner
    func(self, *args, **kwargs)
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/utils/file_transfer.py", line 731, in run
    self.complete_close_and_finalize()
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/utils/file_transfer.py", line 504, in complete_close_and_finalize
    self.finalize()
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/utils/file_transfer.py", line 671, in finalize
    return super(FileDownload, self).finalize()
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/utils/file_transfer.py", line 573, in finalize
    self._verify_checksum()
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/utils/file_transfer.py", line 550, in _verify_checksum
    if self.checksum and not self._checksum_correct():
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/utils/file_transfer.py", line 685, in _checksum_correct
    return self.dest_file_obj.md5_checksum() == self.checksum
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/utils/file_transfer.py", line 425, in md5_checksum
    position, chunk = self._read(position, self.chunk_size)
  File "/home/bjester/.pex/install/kolibri-0.16.0rc0-py2.py3-none-any.whl.57ab72bf625fff956f2ae4c3319e5b7fc8945090/kolibri-0.16.0rc0-py2.py3-none-any.whl/kolibri/utils/file_transfer.py", line 261, in _read
    raise ValueError("Attempting to read data that has not been stored yet")
ValueError: Attempting to read data that has not been stored yet

may have been the cause of subsequent logging:

WARNING 2024-02-14 08:08:49,603 kolibri.core.content.utils.content_request ContentNode files may not have imported successfully: bc502f76e9ff5c459989a02de27093f7

@rtibbles thinks that it may be related to cleanup happening during content download. Maybe file transfer would handle this, rather than just raising the error? Seems like something has gone wrong with locks that are intended to prevent this!

ishanjha03 commented 7 months ago

The ValueError suggests the code is trying to read data from a file that hasn't been completely downloaded yet. This likely throws off the checksum verification step (_verify_checksum) later in the process.

@rtibbles is spot on – the error seems related to how cleanup interacts with content downloads. Ideally, the FileDownload class (kolibri.utils.file_transfer.py) should handle this scenario gracefully, perhaps by waiting for the download to finish before attempting to read or verify the file.

oge1ata commented 6 months ago

Could I have a bit more context on this issue?

rtibbles commented 6 months ago

The error here is happening in our code that handles chunked file downloading - it also allows streaming of these files while the download is in progress. What has happened here is that a chunk is trying to be read, but doesn't appear to exist, which is meant to be safeguarded against by the locks that are used.

oge1ata commented 6 months ago

Oh alright, thank you

I'll keep on looking at it.