yosi-dediashvili commented 10 years ago

Including the factory, the MainProvider and everything else.

[x] MainProvider
[x] Providers factory
Providers
[x] OpenSubtitles
[x] Addic7ed
[x] Torec
[x] Subscenter
[x] Subtitles (Ktuvit)
[ ] Subscene
Languages that will be supported
[x] Hebrew
[x] English
[x] Spanish
[x] Arabic
[x] Bulgarian
[x] Slovak
[x] Turkish
[x] Czech
[x] Russian
[x] Norwegian
[x] Swedish
[x] French
[x] Greek
[x] Portuguese
[x] Croatian
[ ] Romanian
[ ] Catalan

yosi-dediashvili commented 10 years ago

OpenSubtitles.org

Unlike the other providers, in this site we're using an XML-RPC method, and not simple HTTP. That means that we're not using the PerformRequest() method in here.

The problem with that is that we're currently not protecting ourselves from a communication failures (we'll not perform a retry on timeouts etc.). This is bad for us, because it means that the provider is fragile when compared to the other providers.

The correct approach, probably, is to wrap the XML-RPC calls with our function, that will make a retry when we fail (much like the PerformRequest() method).

Also, another idea that came to mind was to implement some sort of caching, meaning that when our server implementation will receive a request that is to be sent to OpenSubtitles, it will check whether such request already been queries, and if so, will simply return the same result.

Subscene.com

They supply a method for querying both for title, and in such case, we can query either movie name, series name, and series name + season.

Another option, is to query for a release name by using the following url: http://subscene.com/subtitles/release?q=<My+Query>. When we query that way, the result is formatted similar to a page of a movie/season, i.e., a table with all the version matched the query (might be different movies).

However, the site may decide to redirect us between the title and the release pages, so we need to keep track of that, because each one returns a different result. The

Subscenter.org

Currently, we're performing the same trick with serieses as with the Subtitle's provider. We need to stop that, and return the exact episode that was queried.

Subtitle.co.il (ktuvit.com)

They have an hidden feature where an IMDB's id can be used as a query.
The have an ajax service for retrieving subtitles info, we should check it: http://www.ktuvit.com/getajax.php?moviedetailssubtitles=<movie_id> http://www.ktuvit.com/getajax.php?episodedetails=<episode_id>
They have a new domain name: www.ktuvit.com
In order to login, we need to pass the authentication cookies:
1. Add the user id under: slcoo_user_id
2. Add the encoded password under: slcoo_user_pass
3. The resulting cookie header will look like: Cookie: slcoo_user_id=<user_id>; slcoo_user_pass=<encoded_password>

Query

First of all, we always have to go through the query stage, because we access the versions using the title's unique id.

For both movies and episode we can use IMDB's id in the query. For episodes, it should the series's ID and not the episode's one.

Encoding the password

Traditionally, when a user logs in to the site, he passes the password (plaintext) to the server, and the server returns the scloo_user_pass cookie.

After some investigations, i managed to assemble the algorithm in the client side, and it turns out that we don't really need to get the cookie from the server in order for it to work (i.e., the server does not keep the provided scloo_user_pass in its database for clarification, and instead, he calculates it each time for the same user).

For starters, we pass the password AS-IS to MD5, resulting in 16Bytes hash value. We encode the bytes using hex chars, resulting in a string of length 32.

Then, we substitute the letters in the following indexes with each other:

0 <->26
2 <->16
6 <->24
7 <-> 9 
19<->31
21<->30

Some examples (note that some of the indexes seems like they didn't change, but it's simply because the letters are the same in those indexes):

MD5: CC4F660DFB6915C150AC1391FDCE87E6
SUB: CC5F66FBFD6915C140A61E910DCE873C

MD5: FC10F7C2C2A8A0179E06C35D3CF96935
SUB: FC90F732C2A8A0171E05C35DCCF96936

MD5: C7C8169F844857875C4E82ED16638391
SUB: 675816148F485787CC4189ED96C3832E

After we executed the substitution, we add two random characters (lower-case ascii or digits) between every 6 characters (total of 10 random chars). The resulting string is the slcoo_user_pass value.

For example:

SUB:             675816148F485787CC4189ED96C3832E
RANDOM:          ejkzvtlzhi
slcoo_user_pass: 675816ej148F48kz5787CCvt4189EDlz96C383hi2E

Note that using the same character over and over again seems to work also, so there's no real need to use random characters when we generate the value.

Addic7ed.com

When an episode is being looked up, it can be accessed directly, using the following url: http://www.addic7ed.com/serie/<series_name>/<season_number>/<episode_number>/<episode_name>
The episode name does not have to be correct. It should simply be some chars. So, we don't have to know the episode name in order to access the episode name directly.

Torec.net

The results are split into pages in the site. In the current implementation, we only retrieve the first page, Ignoring the rest.
It seems that Torec also supports querying using the imdb id tt<id>. For movies, it works right away. For serieses, it only supports the series id, and not the episode. So, we'll drop that for the episodes.

Torec's Hamster

In 2.x.x the hamster was initialized only after the version got selected, instead, in order to shorten the wait period, we can collect ticket for each search result, and then, if we choose to download a specific version, we'll have the ticket right away.

We need to go on the more efficient solution which is a single hamster running on a different thread, and associated with Torec's provider. The hamster will be able to consume more than single sub_id, and while running, will make sure that all the sub_ids passed to it are receiving up-to-date tickets from Torec.

yosi-dediashvili commented 10 years ago

Add support for the French, Swedish and Greek languages in the current providers:

OpenSubtitles
Addic7ed
Subscene

yosi-dediashvili commented 10 years ago

Finished with the OpenSubtitles provider: 7362d5d9cf5741de53f57fa031e4da7dbdca0cbc.

yosi-dediashvili commented 9 years ago

Finished with the Addic7ed provider: b42e7b6.

yosi-dediashvili commented 9 years ago

Finished with Subscenter's provider: a043c76f009ae15b9b3736bf5a39902d00489eb5

vavavr00m commented 9 years ago

Will http://thesubdb.com/api/ be supported?

yosi-dediashvili commented 9 years ago

Not currently. I added it to version 3.1 #29

yosi-dediashvili / SubiT

The providers module #4

Providers

Languages that will be supported

OpenSubtitles.org

Subscene.com

Subscenter.org

Subtitle.co.il (ktuvit.com)

Query

Encoding the password

Addic7ed.com

Torec.net

Torec's Hamster