oll3 / bita

Differential file synchronization over http
https://crates.io/crates/bita
MIT License
256 stars 8 forks source link

Use as library #13

Open MCOfficer opened 4 years ago

MCOfficer commented 4 years ago

Hello,

Bita almost perfectly fits my needs, except it seems to be exclusively a CLI tool. Is there any chance this can be used as library from another rust project?

otavio commented 4 years ago

We will also need it as a library.

oll3 commented 4 years ago

Hi,

Nice to hear that someone except me is actually trying to make use of bita! :)

About the library split it's something I've been working on but it's still very much a work in progress. The library part is referred to as bitar but it's interface is not very well defined and undocumented as for now. Mainly there are parts in the cli tool that should live in the library to make it easier to reuse from other tools. I'm working on creating a better interface for the library and mainly for the clone command but it's done in my spare time so progress is slow.

So for now I can only refer you to bitar and copy/reuse the parts needed from the cli tool. Help and PR's are of course welcome and I will try to support whenever I can.

oll3 commented 4 years ago

The 0.6.x releases of bitar has addressed this issue somewhat, mainly trying to make it easier to use the clone functionality. Feedback is appreciated!

MCOfficer commented 4 years ago

Thank you, that looks promising indeed. I'll give it a shot when i can. I've already written a dumb substitute in the meantime; here's hope i won't need it!

MCOfficer commented 4 years ago

Thanks to the example, i mostly made it work - it's no as easy as i'd like, but perfectly functional. However, when trying to use it from an async context (specifically, iced-futures), i'm getting errors like these:

error: future cannot be sent between threads safely
   --> src/instance.rs:106:17
    |
106 |                 iced::Command::perform(perform_update(self.clone()), Message::Updated)
    |                 ^^^^^^^^^^^^^^^^^^^^^^ future returned by `perform_update` is not `Send`
    | 
   ::: /home/flo/.cargo/registry/src/github.com-1ecc6299db9ec823/iced_futures-0.1.2/src/command.rs:32:22
    |
32  |         future: impl Future<Output = T> + 'static + Send,
    |                      ------------------ required by this bound in `iced_futures::command::Command::<T>::perform`
    |
    = help: the trait `std::marker::Send` is not implemented for `dyn bitar::reader::Reader`
note: future is not `Send` as this value is used across an await
   --> src/update.rs:77:34
    |
58  |       let archive = Archive::try_init(&mut reader).await?;
    |                                       -----------        - `&mut reader` is later dropped here
    |                                       |
    |                                       has type `&mut dyn bitar::reader::Reader`
...
77  |       let total_read_from_remote = clone_from_archive(
    |  __________________________________^
78  | |         &CloneOptions::default(),
79  | |         &mut reader,
80  | |         &archive,
...   |
83  | |     )
84  | |     .await?;
    | |__________^ await occurs here, with `&mut reader` maybe used later

If you don't need any kind of feedback, one can use this terrible workaround: https://github.com/EndlessSkyCommunity/ESLauncher2/blob/d45b4223f3c387874a8d34d06bdc6d80030cb61b/src/instance.rs#L213-L222 But now i kinda need this function to return things, so i'm back to square one.

Is this something that can be fixed in bitar, e.g. by implementing Send for Reader and CloneOutput? (and possibly more? those are the two i get errors for)

oll3 commented 4 years ago

Latest master (5d34ca6) may solve your troubles @MCOfficer .

MCOfficer commented 4 years ago

Latest master (5d34ca6) may solve your troubles @MCOfficer .

That seems to have worked, thank you. It still is a bit unwieldy (and now ReaderRemote forces a direct dependency on reqwests), but i can live with that.

oll3 commented 4 years ago

Not sure I follow... Why do you have to create another async runtime, aren't you already running tokio, or are you mixing runtimes? About the reqwest dependency I also realized this and will add another wrapper call to avoid that later.

MCOfficer commented 4 years ago

Not sure I follow... Why do you have to create another async runtime, aren't you already running tokio, or are you mixing runtimes?

Probably - without that workaround, i get an error telling me I'm not on the tokio runtime, so i guess iced uses the futures runtime internally. I should also look into this, apparently one can make iced use tokio.

Either way, I assume this is nothing that needs fixing on bitar's end.

oll3 commented 4 years ago

Will not support multiple runtimes at this time, no :)

MCOfficer commented 3 years ago

In case future viewers also run into problems with tokio compatibility, i'd like to point them towards stjepang's latest stunt: tokio-fix.

Edit: it's been yanked for unknown reasons. Check out async-compat instead.

Songtronix commented 3 years ago

I'm interested at using this to serve updates of an game but got a few questions:

(Sorry for hijacking this; Really interested in this approach)

oll3 commented 3 years ago

I'm interested at using this to serve updates of an game but got a few questions: is it stable?

I've used bita at small scale but in production for some time with good results. Not planning on changing the archive format and my intent is definitely to keep it backwards compatible in the meaning that any old archive should be possible to read with a more recent version of the tool. And I would consider the library API to be usable but not stable. It will probably keep changing a bit since not satisfied with parts of it.

So yes and no, I guess :)

how is performance?

Depends on what you mean by performance. Compared to some other tool or in numbers like memory or cpu usage, bytes/second scanned, network transfer speed? Think it's probably best to measure for your own specific case.

how much bandwidth can I expect to save roughly?

Impossible to say since it's highly dependent on the data operating on. Best is probably to test with your specific dataset. The bita diff command might be a simple way to get a hint of how much data is shared (and how much that would have to be transferred) between two files according to the algorithm used.

Generally the more data two versions of a file share and the more data is kept in order between those versions the better the result. Eg. if you have something like a tar file and create a new version of it but appending a single new file to it I would expect the transfer on the wire to be roughly around the size of that appended file.

is it good for large archieves? (200MB-4GB)

Definitely can be but same answer as above, you will need to test on your specific data to know. There should be no hard limit on file size that is close to those numbers.

how does it perform with archieves with a ton of files (~4000 files)?

bita doesn't really work with or care about multiple files. It operates on and compresses a single file. If that file is something like a tar-archive with lots of files in it makes no difference to bita. Will probably work best if the archive is uncompressed before processed with bita though. And also good if files in the archive are kept in order between versions of that archive.

(Sorry for hijacking this; Really interested in this approach).

No worries, hope you got some answers out of this :)

MCOfficer commented 3 years ago

Can't speak for performance since i have to reference points, but as for stability, ESLauncher2 has been using bita in "production" for its most trivial task (updating an instance) for roughly a year. There is a fallback, but it's for when the upstream server gives out, afaik it was never used due to bita throwing up.