qbittorrent / qBittorrent

qBittorrent BitTorrent client
https://www.qbittorrent.org
Other
28.41k stars 3.99k forks source link

Beta releases with libtorrent2 and new info-hashes #15040

Closed sledgehammer999 closed 3 years ago

sledgehammer999 commented 3 years ago

My intention is to release the 4.4.x series based on libtorrent2. Because of that I intend to do a few beta releases. They will still be based on Qt 5.15. I suppose there isn't an objection to switching to libtorrent2, right?

From a discussion with @glassez and @Chocobo1 it was indicated there is a situation on how to handle the new info-hashes. Background: Libtorrent2 now supports v2 torrents. The v2 torrents have an sha-256 infohash whereas the v1 torrent have an sha1 infohash. The difference is in the length. sha-256 is 64 chars long while sha1 is 40 chars long (when represented in hex). The bittorrent protocol chooses to ID the v2 torrents by truncating their sha-256 hash to 40 chars (to be represented as fake sha1 hash). Also there can be a hybrid torrent which has both a valid sha1 (not truncated sha-256) and a valid sha-256 hash.

From my understanding there are 2 pending issues to be tackled:

  1. How to represent to the user the hashes?
  2. How to output the hashes to the outside world, like in the scripts

For number 1 I think it is easy. Just add another field in the GUI/WEBUI. For number 2 @glassez and @Chocobo1 had some proposals. I'll let them present them in the comments.

Mostly devs or script users are expected to weigh in here. Also I notify @qbittorrent/frequent-contributors @qbittorrent/bug-handlers

glassez commented 3 years ago

My main belief is not to confuse the concepts (even if they are confused in some places of the bittorrent protocol for some compatibility, etc.). If we provide some kind of torrent infohash, then it should be exactly the infohash (i.e. the result of applying the corresponding hash function to the data of the info section of the torrent file). If we provide some value that serves to identify the torrent on the tracker or in the application, but it is not the torrent's infohash generally, then we should not call it an infohash.

glassez commented 3 years ago

For number 1 I think it is easy. Just add another field in the GUI/WEBUI.

Shouldn't we also provide so-called "torrent ID" as a separate field (both in UI and for scripts)? It can be useful to the user, for example, to identify the relevant "resume data" file, etc. Although in the current implementation, it can be easily calculated based on the values of the torrent's infohashes...

sledgehammer999 commented 3 years ago

If we provide some value that serves to identify the torrent on the tracker or in the application, but it is not the torrent's infohash generally, then we should not call it an infohash.

I agree. We should have two distinct concepts:

  1. TorrentID: This will be used by the API to identify the torrents in the session. I don't know how useful it will be outside of qbt though. TorrentID for pure v1 torrents is identical to their sha1 hash. TorrentID for hybrid and pure v2 torrents is the truncated sha-256 hash. And I think that's what libtorrent does too and that's how torrents are identified in the tracker.
  2. Info-hashes: The real v1 and v2 infohashes where available.

API user and script writers should be expected to know this bittorrent v2 terminology.

FranciscoPombal commented 3 years ago

There should be a clear indication of whether the torrent is V1, V2, or hybrid, and for each possible case, there should be SHA-1, SHA-256 and "truncated SHA256" fields that show all the relevant hashes. If they don't apply (e.g. SHA256 field in a V1-only torrent), it can show N/A.

I suppose there isn't an objection to switching to libtorrent2, right?

Nope. The only remaining technical limitation IIRC now is getting the port into vcpkg for ez-pz Windows CI and builds (which I am working on).

FranciscoPombal commented 3 years ago

If we provide some value that serves to identify the torrent on the tracker or in the application, but it is not the torrent's infohash generally, then we should not call it an infohash.

I agree. We should have two distinct concepts: (...)

Maybe an API field indicating the type of torrent would also be useful? As in V1/V2/hybrid.

sledgehammer999 commented 3 years ago

Maybe an API field indicating the type of torrent would also be useful? As in V1/V2/hybrid.

For ease of use yes. But the type property can be inferred also by the values of torrentID and the info-hashes.

glassez commented 3 years ago

But the type property can be inferred also by the values of torrentID and the info-hashes.

Infohashes are enough.

Maybe an API field indicating the type of torrent would also be useful?

What API do you mean?

FranciscoPombal commented 3 years ago

Maybe an API field indicating the type of torrent would also be useful?

What API do you mean?

WebAPI

Chocobo1 commented 3 years ago

If we provide some kind of torrent infohash, then it should be exactly the infohash

This concept is clear for v1 and v2 torrents, but what about hybrid torrent where both hash are available? Perhaps hybrid torrent should be regarded as v2 torrent since the word "hybrid" is only mentioned in the "Upgrade Path" of the bittorrent v2 spec and is viewed as a kind of backwards support ("For interoperability with BEP 3").

For running external program, as opposed to my earlier suggestion I think we can add only one new parameter (torrent ID) and fix %I (info hash) to behave correctly.

glassez commented 3 years ago

hybrid torrent should be regarded as v2 torrent

:+1: This thought has visited me... But is infohash-v1 of hybrid torrent useless for the users (scripts) that we can safely ignore it? To be honest, I have no idea what exactly infohashes can be used for by scripts, so I can't firmly answer this. Anyway treating "infohashes" field as the "main/primary infohash", we can later add an "additional infohash" field, if it is really needed.

For running external program, as opposed to my earlier suggestion I think we can add only one new parameter (torrent ID) and fix %I (info hash) to behave correctly.

I think we may even provide no "torrent id" parameter since it is easily calculated from infohash (especially if there is single infohash).

FranciscoPombal commented 3 years ago

Treating hybrid torrents as V2 doesn't seem correct. I can already anticipate user reports that they'd like the ability to get all different infohashes from a hybrid torrent, as opposed to only getting its V2 infohash.

But is infohash-v1 of hybrid torrent useless for the users (scripts) that we can safely ignore it? To be honest, I have no idea what exactly infohashes can be used for by scripts, so I can't firmly answer this.

I would put my money on "No". It should be possible to query the V1 infohash as well as the full 64 char SHA256 V2 hash of the torrent.

glassez commented 3 years ago

@FranciscoPombal There doesn't seem to be enough clarity here... There are two main aspects that depend on infohashes that we are talking about. The first is the providing information to the user through some kind of UI (WebAPI also applies here). So, there is no problem here to provide all the available torrent info hashes. The second is providing parameters for running an external program (i.e. "Run program on torrent complete" feature). And here the question arises (at least for me), whether all available infohashes are required for use by external scripts, and if so, in what form they should be provided - in a single parameter (for example, infohash1, infohash2), or in separate parameters. The second option (with separate parameters) seems to me a bad idea. It assumes that only one of them can be used on the command line, but what's the point?

FranciscoPombal commented 3 years ago

@glassez

@FranciscoPombal There doesn't seem to be enough clarity here... There are two main aspects that depend on infohashes that we are talking about. The first is the providing information to the user through some kind of UI (WebAPI also applies here). So, there is no problem here to provide all the available torrent info hashes.

:+1:

The second is providing parameters for running an external program (i.e. "Run program on torrent complete" feature). And here the question arises (at least for me), whether all available infohashes are required for use by external scripts, and if so, in what form they should be provided - in a single parameter (for example, infohash1, infohash2), or in separate parameters. The second option (with separate parameters) seems to me a bad idea. It assumes that only one of them can be used on the command line, but what's the point?

Thanks for clarifying. I am not totally sure what you meant by "It assumes that only one of them can be used on the command line, but what's the point?" though, can you please elaborate?

I imagine that a user may have the need to get all infohashes, and providing them as a single parameter puts the additional burden of parsing them out of the string on them, which is not as nice as providing them as separate arguments.

If we provide them as all separate parameters, e.g. %X, %Y, %Z (just placeholder letters here), I think it will be easier for end users. They can choose to use one, two or all of them, as needed.

glassez commented 3 years ago

providing them as a single parameter puts the additional burden of parsing them out of the string

A child's task...

I am not totally sure what you meant by "It assumes that only one of them can be used on the command line, but what's the point?" though, can you please elaborate?

If we provide infohashes in separate parameters user is allowed to construct command line that contains only one of them. What's the point of doing it? Can there be any valid usecases where this is needed? I can't imagine... This means that such a script will be able to handle only torrents of one of the versions, while it will still run for all the torrents.

FranciscoPombal commented 3 years ago

@glassez

A child's task...

Of course it is easy. But if we don't do it, then everyone will have to repeat the same code over and over in their scripts. It's about making the lives of our users easier. If I'm writing a simple bash script, for instance, I'd rather not deal with it. It's not about if users can or cannot deal with it, it's about whether they have to deal with it.

If we provide infohashes in separate parameters user is allowed to construct command line that contains only one of them. What's the point of doing it? Can there be any valid usecases where this is needed? I can't imagine...

Better leave this to our users' imagination. Of course none of us can possibly imagine all use cases.

In general, in such cases where we provide the users the means to execute arbitrary scripts, I think the principle of "I can't imagine why this would be useful so let's not implement it" is a bad one, unless of course the implementation would be too costly, which is not the case here.

glassez commented 3 years ago

How about the following?

This means that such a script will be able to handle only torrents of one of the versions, while it will still run for all the torrents.

"I can't imagine why this would be useful so let's not implement it"

I didn't say that. Even if there is some exotic usecase here, it can still be handled with just one parameter for all infohashes.

But if we don't do it, then everyone will have to repeat the same code over and over in their scripts.

This is regardless of the number of parameters. Any script will need to contain code that determines the number of available infohashes.

In any case, I don't intend to argue about it. Since this applies to the user interface, then let the users (the one who expresses the point of view of a hypothetical user of this feature) determine how it should be. If they are satisfied with an inconsistent interface for the sake of getting rid of more than trivial parsing of a pair of hashes from a single parameter, then let them use it.

FranciscoPombal commented 3 years ago

@glassez

OK, I have no interest in arguing about this either. But I'd appreciate it if you could elaborate on this:

If they are satisfied with an inconsistent interface for the sake of getting rid of more than trivial parsing of a pair of hashes from a single parameter, then let them use it.

Why does providing one parameter per infohash (for a total of 3) result in an "inconsistent interface", as opposed to exposing a single parameter that outputs a comma-separated list? Not saying the latter makes it "inconsistent" either, but I'd really like to know why you think the former does.

glassez commented 3 years ago

for a total of 3

There are only 2 possible infohashes in current implementation.

FranciscoPombal commented 3 years ago

@glassez

for a total of 3

There are only 2 possible infohashes in current implementation.

I meant 3 different "torrent identification values" that the user might care about: V1 hash, the "V2 torrent ID" (which is the 40-byte truncated SHA256 hash), and the full V2 SHA256 hash itself.

ghost commented 3 years ago

In libtorrent 2.0 you can't control how much RAM you want to dedicate to disk caching. It's managed by the kernal and from my experience the kernal allocates almost all the RAM to qBt disk cache and later frees up if some other apps need it.

This is problematic because if qBt uses some insane amount of the RAM, system may become less responsive and the user will have no way to control this behaviour. Also since RC1_2 is still getting regular patches, there's no logical explanation as to why you would stop making releases with RC1_2 which is kinda stable now. I feel like 4_3_x branch was cut off too early. Also BitTorrent V2 will take years to become mainstream. There's also a possiblity of it not being adopted by majority of the trackers due to lack of technical skills or enthusiasm.

FranciscoPombal commented 3 years ago

@an0n666

In libtorrent 2.0 you can't control how much RAM you want to dedicate to disk caching. It's managed by the kernal and from my experience the kernal allocates almost all the RAM to qBt disk cache and later frees up if some other apps need it.

Why is this not desired? This is exactly what you want from a data transfer program. On the networking side, μTP was developed with the same principle in mind: it uses all available network capacity, but backs off if other applications need it. It makes sense that consumption of disk I/O resources follows a similar principle.

Most performance issues users face right now can be traced back to either RC_1_2's custom disk cache, or doing multiple fast downloads to an HDD while not knowing that HDDs are bad at random workloads.

This is problematic because if qBt uses some insane amount of the RAM, system may become less responsive and the user will have no way to control this behaviour.

I trust modern kernels to manage the cached memory efficiently, much more than some userspace program... don't you?

Also since RC1_2 is still getting regular patches, there's no logical explanation as to why you would stop making releases with RC1_2 which is kinda stable now. I feel like 4_3_x branch was cut off too early.

It's absolutely logical: RC_2_0 has support for highly anticipated features and is very stable as well. Plus I would think RC_1_2 is receiving some of its last "regular" patches. I believe arvidn has mentioned in the past he wants to drop it, he's just waiting for RC_2_0 to gain critical mass.

Also BitTorrent V2 will take years to become mainstream. There's also a possiblity of it not being adopted by majority of the trackers due to lack of technical skills or enthusiasm.

It will take longer if qBittorrent, one of the most used clients, doesn't adopt it. We can drive adoption by switching to it. Can't wait to drink the tears of μT 2.2.1 boomers when trackers switch to V2.


If you do want to stick with RC_1_2, you can keep using whatever qBittorrent version last supported it, it won't magically stop working once the newer V2 versions are released...

sledgehammer999 commented 3 years ago

I have provided a 4.4.0beta1 release based on libtorrent RC_2_0. I see that #15097 solved the issue of the UI representation. Now, only the script output/handling remains right?

glassez commented 3 years ago

Now, only the script output/handling remains right?

15097 also provides v1/v2 infohashes and torrent ID as parameters for external scripts in "Run external program" feature.

sledgehammer999 commented 3 years ago

@glassez then this can be closed? And after a few betas we can release 4.4.0 stable? I mean there isn't any v2-torrent issues pending, right?

glassez commented 3 years ago

Now, only the script output/handling remains right?

But there are still a few problems with the support of v2 torrents, both in qBittorrent and libtorrent.

sledgehammer999 commented 3 years ago

Are those listed somewhere or can you list them here? I want to assess the current status and see how likely we are close to a stable release.

xavier2k6 commented 3 years ago

@sledgehammer999 may be no harm to peruse #15109

FranciscoPombal commented 3 years ago

@sledgehammer999

Are those listed somewhere or can you list them here? I want to assess the current status and see how likely we are close to a stable release.

As per https://github.com/qbittorrent/qBittorrent/issues/15109, I'd say we are not that close. There are known hangs and crashes still, and builds with 2.x have not received extensive testing of features by any remotely significant number of people. It would be more accurate to classify it as an "alpha", IMO.

glassez commented 3 years ago

@sledgehammer999 Now I work on issues of saving/loading metadata of v2 torrents. You can provide 2nd beta when I done.