saurabhshri / simple-yet-powerful-srt-subtitle-parser-cpp

A single header simple, powerful and full blown srt subtitle parser written in C++.
Other
51 stars 15 forks source link
awesome cpp google-summer-of-code gsoc gsoc-2017 parser parser-library srt srt-subtitles subtitle subtitle-parser subtitles-parsing

= srtparser.h : Simple, yet powerful C++ SRT Subtitle Parser Library. A single header, simple, powerful full blown srt subtitle parser written in C++.


https://github.com/saurabhshri/simple-yet-powerful-srt-subtitle-parser-cpp[srtparser.h] is a single header, simple and powerful C++ srt subtitle parsing library that allows you to easily handle, process and manipulate srt subtitle files in your project. It is an extension of Oleksii Maryshchenko's simple https://github.com/young-developer/subtitle-parser[subtitle-parser]. It has following features :

  1. It is a single header C++ (CPP) file, and can be easily used in your project.
  2. Focus on portability, efficiency and simplicity with no external dependency.
  3. Wide variety of functions at programmers disposal to parse srt file as per need.
  4. Capable of :
    • extracting and stripping HTML and other styling tags from subtitle text.
    • extracting and stripping speaker names.
    • extracting and stripping non dialogue texts.
  5. Easy to extend and add new functionalities.

== How to use srtparser.h

=== General usage ===

srptparser.h is a cross-platform robust srt subtitle parser.

SubtitleParserFactory *subParserFactory = new SubtitleParserFactory("inputFile.srt");
SubtitleParser *parser = subParserFactory->getParser();

//to get subtitles 

std::vector<SubtitleItem*> sub = parser->getSubtitles();

See demo usage in examples directory.

=== Parser Functions ===

The following is a complete list of available parser functions.

Syntax:

[cols="2,1,2,5"] |=== | Class | Return Type | Function | Description

| SubtitleParserFactory | SubtitleParserFactory | SubtitleParserFactory("inputFile.srt") | Creates a SubtitleParserFactory object. Here the inputFile.srt is the path of subtitle file to be parsed. This object is used to create parser.

E.g.: SubtitleParserFactory *subParserFactory = new SubtitleParserFactory("inputFile.srt");

| SubtitleParserFactory | SubtitleParser | getParser() | Returns the SubtitleParser object. This object will be used to parse the subtitle file.

E.g.: SubtitleParser *parser = subParserFactory->getParser();

| SubtitleParser | std::vector<SubtitleItem*> | getSubtitles() | Returns the Subtitle as SubtitleItem object.

E.g.: std::vector<SubtitleItem*> sub = parser->getSubtitles();

| SubtitleParser | std::string | getFileData() | Returns the complete file data read as it is from inputFile.srt

E.g.: std::string fileData = parser->getFileData();

| SubtitleItem | long int | getStartTime() | Returns the starting time of subtitle in milliseconds.

E.g.: long int startTime = sub->getStartTime();

| SubtitleItem | long int | getEndTime() | Returns the ending time of subtitle in milliseconds.

E.g.: long int endTime = sub->getEndTime();

| SubtitleItem | std::string | getStartTimeString() | Returns the starting time of subtitle in srt format.

E.g.: std::string startTime = sub->getStartTimeString();

| SubtitleItem | std::string | getEndTimeString() | Returns the ending time of subtitle in srt format.

E.g.: std::string endTime = sub->getEndTimeString();

| SubtitleItem | std::string | getText() | Returns the subtitle text as present in .srt file.

E.g.: std::string text = sub->getText();

| SubtitleItem | std::string | getDialogue(bool keepHTML, bool doNotIgnoreNonDialogues, bool doNotRemoveSpeakerNames); | Returns the subtitle text after processing according to parameters.

keepHTML = 1 to stop parser from stripping style tags

doNotIgnoreNonDialogues = 1 to stop parser from ignoring and extracting non dialogue texts such as (laughter).

doNotRemoveSpeakerNames = 1 to stop parser from ignoring and extracting speaker names

By default (0,0,0) values are passed.

E.g.: std::string text = sub->getDialogue();

| SubtitleItem | int | getWordCount() | Returns the count of number of words present in the subtitle dialogue.

E.g.: int wordCount = sub->getWordCount();

| SubtitleItem | std::vector | getIndividualWords() | Returns string vector of individual words present in subtitle.

E.g.: std::vector<std::string> words = sub->getIndividualWords();

| SubtitleItem | bool | getIgnoreStatus() | Returns the ignore status. Returns true, if the _justDialogue field i.e. subtitle after processing is empty.

E.g.: bool ignore = sub->getIgnoreStatus();

| SubtitleItem | int | getSpeakerCount() | Returns the count of number of speakers present in the subtitle.

E.g.: int speakerCount = sub->getSpeakerCount();

| SubtitleItem | std::vector | getSpeakerNames() | Returns string vector of speaker names.

E.g.: std::vector<std::string> speakerNames = sub->getSpeakerNames();

| SubtitleItem | int | getNonDialogueCount() | Returns the count of number of non dialogue words present in the subtitle.

E.g.: int nonDialogueCount = sub->getNonDialogueCount();

| SubtitleItem | std::vector | getNonDialogueWords() | Returns string vector of non dialogue words.

E.g.: std::vector<std::string> nonDialogueWords = sub->getNonDialogueWords();

| SubtitleItem | int | getStyleTagCount() | Returns the count of number of style tags present in the subtitle.

E.g.: int styleTagCount = sub->getStyleTagCount();

| SubtitleItem | std::vector | getStyleTags() | Returns string vector of style tags.

E.g.: std::vector<std::string> styleTags = sub->getStyleTags();

| SubtitleWord | std::string | getText() | Returns the subtitle text as present in .srt file.

E.g.: std::string text = sub->getText();

|===

Examples

While I've tried to include examples in the above table, a compilation of all of them together in a single C++ program can be found in example directory.

Contributing

Suggestions, features request, PRs, bug reports, bug fixes are welcomed. I'll be thankful.

Credits

Built upon a MIT licensed simple subtitle-parser called LibSub-Parser by Oleksii Maryshchenko.

The original parser had 3 major functions : getStartTime(), getEndTime() and getText().

Rest work done by Saurabh Shrivastava, originally for using this in his https://saurabhshri.github.io/2017/05/gsoc/creating-a-full-blown-srt-subtitle-parser[GSoC project].