```py
from subby import SMPTEConverter
from pathlib import Path
converter = SMPTEConverter()
file = Path('test_subtitle.ttml2')
# All statements below are equivalent
srt = converter.from_file(file)
# srt is pysrt.SubRipFile
output = Path('test_subtitle.srt')
srt.save(output)
# saved to file.srt
```
Input(test_subtitle.ttml2):
<?xml version="1.0" encoding="utf-8"?>
<tt xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3.org/ns/ttml#styling" ttp:version="2" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xml:lang="en-US">
<head>
<styling>
<style xml:id="s0" tts:fontFamily="sansSerif" tts:fontStyle="italic" tts:color="white" tts:fontWeight="normal" tts:fontSize="100%"></style>
<style xml:id="s1" tts:fontFamily="sansSerif" tts:fontStyle="normal" tts:color="white" tts:fontWeight="normal" tts:fontSize="100%"></style>
</styling>
<layout>
<region xml:id="r0" tts:extent="100% 15%" tts:origin="0% 85%" tts:displayAlign="after" tts:textAlign="center"></region>
<region xml:id="r1" tts:extent="100% 15%" tts:origin="0% 0%" tts:displayAlign="before" tts:textAlign="center"></region>
</layout>
</head>
<body style="s1">
<div>
<p begin="00:01:00.000" end="00:01:01.000" region="r0">The two most important days in your life are<br />the day you are born & the day you find out why.</p>
</div>
</body></tt>
Output(test_subtitle.srt):
1
00:01:00,000 --> 00:01:01,000
The two most important days in your life are
the day you are born the day you find out why.
Expected Output(test_subtitle.srt):
1
00:01:00,000 --> 00:01:01,000
The two most important days in your life are
the day you are born & the day you find out why.
This seems to be related to using html.unescape here. Using data directly on the line instead of unescaped, it keeps the character & but I also don't know if it's correct to do it this way.
Well, that's all I've found so far, I hope it helps in some way.
SDHStripper:
For some reason in this sample, the SDH part ends up being completely removed.
Details about the version and libraries of Python:
``` Version: Python 3.10.11 ``` ``` Libraries: beautifulsoup4 4.12.2 chardet 5.2.0 click 8.1.6 colorama 0.4.6 construct 2.8.8 lxml 4.9.3 pymp4 1.4.0 pysrt 1.1.2 soupsieve 2.4.1 subby 0.1.15 tinycss 0.4 ```
Code executed for this example:
```py from subby import WebVTTConverter, CommonIssuesFixer, SDHStripper from pathlib import Path converter = WebVTTConverter() fixer = CommonIssuesFixer() stripper = SDHStripper() file = Path('test_accessibility.vtt') file_sdh = Path('test_accessibility_sdh.srt') file_stripped = Path('test_accessibility_stripped.srt') srt, _ = fixer.from_srt(converter.from_file(file)) srt.save(file_sdh) # saved to file_sdh.srt stripped, status = stripper.from_srt(srt) if status is True: print('stripping successful') stripped.save(file_stripped) # saved to file_stripped.srt ```
Input(test_accessibility.vtt):
Output(test_accessibility_stripped.srt) is empty:
Expected Output(test_accessibility_stripped.srt):
This seems to happen because of this line. This same line was also changed in a recent commit here.
SMPTEConverter:
For some reason the character
&
is removed from text.Details about the version and libraries of Python:
``` Version: Python 3.10.11 ``` ``` Libraries: beautifulsoup4 4.12.2 chardet 5.2.0 click 8.1.6 colorama 0.4.6 construct 2.8.8 lxml 4.9.3 pymp4 1.4.0 pysrt 1.1.2 soupsieve 2.4.1 subby 0.1.15 tinycss 0.4 ```
Code executed for this example:
```py from subby import SMPTEConverter from pathlib import Path converter = SMPTEConverter() file = Path('test_subtitle.ttml2') # All statements below are equivalent srt = converter.from_file(file) # srt is pysrt.SubRipFile output = Path('test_subtitle.srt') srt.save(output) # saved to file.srt ```
Input(test_subtitle.ttml2):
Output(test_subtitle.srt):
Expected Output(test_subtitle.srt):
This seems to be related to using html.unescape here. Using
data
directly on the line instead ofunescaped
, it keeps the character&
but I also don't know if it's correct to do it this way.Well, that's all I've found so far, I hope it helps in some way.