kiwix / libkiwix

Common code base for all Kiwix ports
https://download.kiwix.org/release/libkiwix/
GNU General Public License v3.0
118 stars 56 forks source link

Been able to load split ZIM file (chunks) by filedescriptor #1014

Closed kelson42 closed 10 months ago

kelson42 commented 11 months ago

This ticket is a follow-up of https://github.com/kiwix/kiwix-android/issues/3511.

Today we can load split ZIM files by loading the first chunk and then based on a rule regarding the filenames, the other ZIM chunks are loaded.

Now, we need to have such split ZIM files on Android (for custom apps) because Google Play Store does not allow anymore assets over 512MB. But, also newly, we load the ZIM files via a fd directly from the package blob (with all other assets) to avoid to have to copy the file to the fs (a slow operation which anyway waste mass storage, see https://github.com/kiwix/kiwix-android/pull/3516).

Unfortunately for now, loading split ZIM files via fd seems impossible.

mgautierfr commented 11 months ago

We have a bit of constraints here:

I see two solutions:

MohitMaliFtechiz commented 11 months ago

Opening a archive by passing a list of (fd, offset, size). Each tuple pointing to a part of the archive. The android application being responsible to open all parts first.

@mgautierfr In https://github.com/kiwix/kiwix-android/pull/3526 we are able to properly load the zim parts.

But I'm not sure at all that we can guaranty where the resources are stored in the android bundle.

We are using Play Asset Delivery mode and it puts the zim file in the asset folder.

we can open a archive with (fd, offset of the first part, size of the whole archive)

How can we do it?

mgautierfr commented 11 months ago

In https://github.com/kiwix/kiwix-android/pull/3526 we are able to properly load the zim parts.

Because you are reconstructing the full archive file from the different chunks. The purpose of this issue is to have libzim being able to directly read the different chunks without extracting the chunk in the file system.

We are using Play Asset Delivery mode and it puts the zim file in the asset folder.

The bundle/apk is probably a zip file. Each file in a zip archive is store associated with a small header. So each part is separated from its neighbor by this header. So I doubt that we can guaranty the chunks are store contiguously in the bundle.

How can we do it?

This is easy, if you have all parts store contiguously in the right order, this is the same thing that the whole content.

MohitMaliFtechiz commented 11 months ago

This is easy, if you have all parts store contiguously in the right order, this is the same thing that the whole content.

I mean how can we load the data in libkiwix, as we can get the data from the asset folder via the AssetManager class, it either returns the inputStream or assetFileDescriptor. In https://github.com/kiwix/kiwix-android/pull/3526 I have a list of assetFileDescriptor so how should we load it in libkiwix?

2023-11-07 18:02:52.910 31979-31979 FILES_NAME              org.kiwix.kiwixcustomcustomexample   E  getAssetFileDescriptorFromPlayAssetDelivery: chunk0.zim
2023-11-07 18:02:52.910 31979-31979 FILES_NAME              org.kiwix.kiwixcustomcustomexample   E  getAssetFileDescriptorFromPlayAssetDelivery: [{AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@e3ff6fe} start=48 len=104860033}]
2023-11-07 18:02:52.911 31979-31979 FILES_NAME              org.kiwix.kiwixcustomcustomexample   E  getAssetFileDescriptorFromPlayAssetDelivery: chunk1.zim
2023-11-07 18:02:52.911 31979-31979 FILES_NAME              org.kiwix.kiwixcustomcustomexample   E  getAssetFileDescriptorFromPlayAssetDelivery: [{AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@e3ff6fe} start=48 len=104860033}, {AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@f6930ac} start=104860128 len=104859904}]
2023-11-07 18:02:52.911 31979-31979 FILES_NAME              org.kiwix.kiwixcustomcustomexample   E  getAssetFileDescriptorFromPlayAssetDelivery: chunk2.zim
2023-11-07 18:02:52.911 31979-31979 FILES_NAME              org.kiwix.kiwixcustomcustomexample   E  getAssetFileDescriptorFromPlayAssetDelivery: [{AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@e3ff6fe} start=48 len=104860033}, {AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@f6930ac} start=104860128 len=104859904}, {AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@220850a} start=209720080 len=104860096}]
2023-11-07 18:02:52.911 31979-31979 FILES_NAME              org.kiwix.kiwixcustomcustomexample   E  getAssetFileDescriptorFromPlayAssetDelivery: chunk3.zim
2023-11-07 18:02:52.912 31979-31979 FILES_NAME              org.kiwix.kiwixcustomcustomexample   E  getAssetFileDescriptorFromPlayAssetDelivery: [{AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@e3ff6fe} start=48 len=104860033}, {AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@f6930ac} start=104860128 len=104859904}, {AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@220850a} start=209720080 len=104860096}, {AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@78eab98} start=314580224 len=104859904}]
2023-11-07 18:02:52.912 31979-31979 FILES_NAME              org.kiwix.kiwixcustomcustomexample   E  getAssetFileDescriptorFromPlayAssetDelivery: chunk4.zim
2023-11-07 18:02:52.912 31979-31979 FILES_NAME              org.kiwix.kiwixcustomcustomexample   E  getAssetFileDescriptorFromPlayAssetDelivery: [{AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@e3ff6fe} start=48 len=104860033}, {AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@f6930ac} start=104860128 len=104859904}, {AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@220850a} start=209720080 len=104860096}, {AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@78eab98} start=314580224 len=104859904}, {AssetFileDescriptor: {ParcelFileDescriptor: java.io.FileDescriptor@6c9e7d6} start=419440176 len=35940012}]
MohitMaliFtechiz commented 11 months ago

Because you are reconstructing the full archive file from the different chunks. The purpose of this issue is to have libzim being able to directly read the different chunks without extracting the chunk in the file system.

Yes, that I did for testing(is the file in the right order or not corrupted) because we do have not a method in Archive to load with the fd list.

mgautierfr commented 11 months ago

I assume it is not possible to store chunks contiguously in the bundle (and it is confirmed by the offset/size of your list of fd), so passing a (fd, offset of the first part, size of the whole archive) is not a solution.

I mean how can we load the data in libkiwix, as we can get the data from the asset folder via the AssetManager class, it either returns the inputStream or assetFileDescriptor. In https://github.com/kiwix/kiwix-android/pull/3526 I have a list of assetFileDescriptor so how should we load it in libkiwix?

You cannot for now. We don't have the API. And doing it is not only a matter of API, we probably have to change a bit how handle things internally.

Yes, that I did for testing(is the file in the right order or not corrupted) because we do have not a method in Archive to load with the fd list.

And this is your only solution for now.

MohitMaliFtechiz commented 11 months ago

And this is your only solution for now.

@mgautierfr, @kelson42 Yes as of now, we have only this solution, but if libzim starts supporting to open zim files via inputStream we can load the zim file from multiple chunks like we are constructing the file in https://github.com/kiwix/kiwix-android/pull/3526, and reading that file. and if libzim starts supporting to opening zim file via inputStream our file picker issue will resolve in the play store variant then we can open files from storage on Android 11 and above with File picker without MANAGE_EXTERNAL_STORAGE permission, see more reference here https://github.com/kiwix/kiwix-android/issues/2890#issuecomment-1183202716

You cannot for now. We don't have the API. And doing it is not only a matter of API, we probably have to change a bit how handle things internally.

Okay if it is not possible right now, but we have the API where we can open the zim file with a single fd, here we have the multiple fd right now IMO we might be creating a single fd from this fd list. I am not sure about it, it is technically possible or not, but we can try this.

kelson42 commented 11 months ago

@mgautierfr I don't understand what opening ZIM file via inputstream exactly means from a libzim perspective. Do you?

MohitMaliFtechiz commented 11 months ago

@kelson42 I mean, like we have two methods for opening ZIM files by (FilePath, FileDescriptor)

https://github.com/kiwix/java-libkiwix/blob/f9dc43e17700568143ef24edd7ca30fc8ea711be/lib/src/main/java/org/kiwix/libzim/Archive.java#L31

public Archive(String filename) throws ZimFileFormatException
  {
    setNativeArchive(filename);
  }

  public Archive(FileDescriptor fd) throws ZimFileFormatException
  {
    setNativeArchiveByFD(fd);
  }

  public Archive(FileDescriptor fd, long offset, long size)
          throws ZimFileFormatException
  {
    setNativeArchiveEmbedded(fd, offset, size);
  }

we should have a new method in libzim to open ZIM files by inputStream.

public Archive(InputStream inputStream) throws ZimFileFormatException
  {
    setNativeArchive(inputStream);
  }
kelson42 commented 11 months ago

@mgautierfr @MohitMaliFtechiz I understand that part, what is very unclear to me is:

I hardly believe "yes" can be answered to both questions...

mgautierfr commented 11 months ago

Is inputstream a low level standart file access solution?

It is a "low level" java standard stream access solution. It doesn't allow seeking and it is java only anyway. So the short answer is no.

Then wonder how the asset library wozld now about location of the chunks...

Android bundle are probably zip files and AssetFileDescriptor (which are not inputstream) are somehow handles to the content stored in the bundle (as libzim::Entry is a handle to content in a zim archive).


The only solution I see is:

kelson42 commented 10 months ago

Kamino closed and cloned this issue to openzim/libzim