wtekiela / opensub4j

Java library for communicating with opensubtitles.org XML-RPC API
Other
47 stars 19 forks source link

Downloading subtitles with custom charset / encoding #30

Closed VeiZhang closed 3 years ago

VeiZhang commented 3 years ago

There seems no params set charset in XMLRPC download, but according to the sample, the download link can set charset:

https://dl.opensubtitles.org/en/download/src-api/vrf-19be0c59/sid-Es8Yw0zrLBHcaLjKikJ-2rHWo99/filead/1953189057.gz
https://dl.opensubtitles.org/en/download/subencoding-utf8/src-api/vrf-19be0c59/sid-Es8Yw0zrLBHcaLjKikJ-2rHWo99/filead/1953189057.gz

I had try the two download link, and it works by http. But I don't know how to do with XMLRPC.

VeiZhang commented 3 years ago

I checked your method to set charset, but it didn't work.

https://github.com/wtekiela/opensub4j/blob/174e9329e3a0f9f1821676ef6fdada892d484826/src/main/java/com/github/wtekiela/opensub4j/response/SubtitleFile.java#L51

wtekiela commented 3 years ago

Hi @VeiZhang right now the library doesn't provide a way to customize the download link using a specific encoding. The method you've mentioned is not to set the encoding, but rather to get the content using provided encoding (used in toString method after unzipping - check here: https://github.com/wtekiela/opensub4j/blob/174e9329e3a0f9f1821676ef6fdada892d484826/src/main/java/com/github/wtekiela/opensub4j/response/SubtitleFile.java#L134 ) so ideally you should pass the original encoding of the subtitle file.

So to allow downloading subtitles with a specific encoding would be an enhancement

VeiZhang commented 3 years ago

@wtekiela Thanks for your reply.

It didn't work, the reason is the download link. So I try to add these code: https://github.com/wtekiela/opensub4j/blob/174e9329e3a0f9f1821676ef6fdada892d484826/src/main/java/com/github/wtekiela/opensub4j/impl/SearchOperation.java#L57

videoProperties.put("subencoding", "utf8");

Still not work. So if there is no other params to set, I will replace the link with encoding to fix the messy code.

wtekiela commented 3 years ago

Can you provide some concrete steps to reproduce and expected outcomes? I'm not sure if I follow your train of thought with those changes.

VeiZhang commented 3 years ago

Yes, I made a query,

ListResponse<SubtitleInfo> response = mOpenSubtitlesClient.searchSubtitles("all", null, null, "", "The avengers", "1", "1", null);

Choose the first subtitle, and the content of download link contains messy code which encoding is UTF-8.

As u said, there is no encoding param to set in query request. I get a idea, I can get the download link after a query, and I will use a rule to add the encoding to create a new link, then download the new link by myself:

https://dl.opensubtitles.org/en/download/src-api/vrf-19be0c59/sid-Es8Yw0zrLBHcaLjKikJ-2rHWo99/filead/1953189057.gz
// add the /subencoding-utf8/
https://dl.opensubtitles.org/en/download/subencoding-utf8/src-api/vrf-19be0c59/sid-Es8Yw0zrLBHcaLjKikJ-2rHWo99/filead/1953189057

Hope I explain clearly.

VeiZhang commented 3 years ago

@wtekiela Sorry, I found another way to solve, use SubtitleInfo.getEncoding() to getContent, the messy code will disappear. Thanks!