podverse / podverse-web

Podverse web app written with React and Next.js
https://podverse.fm/about
GNU Affero General Public License v3.0
83 stars 30 forks source link

Is vtt subtitle format supported? #1168

Closed Marzal closed 1 year ago

Marzal commented 1 year ago

Describe the bug Transcripts won't show in a feed with vtt subtitles ¿Is vtt supported? It's mentioned in https://github.com/podverse/podverse-web/blob/2df1615d467980659a211835fdaa16d1b2b83202/src/lib/utility/transcript.ts#L8

To Reproduce Steps to reproduce the behavior:

  1. Go to https://podverse.fm/episode/yuY4qZ4Cc
  2. Click on Transcript
  3. Console Error: getParsedTranscript error: TypeError: Data is not valid SRT format

Expected behavior Subtitles are shown or a warning about format not supported

Screenshots Example episode and error imagen

Desktop (please complete the following information):

Additional context Android app doesn't show the subs also.

Marzal commented 1 year ago

And now adding a tag with application/x-subrip mime, the subs are shown imagen

mitchdowney commented 1 year ago

@Marzal I'll try to take a look at this issue this week. It might just be due to a simple bug / oversight...

Marzal commented 1 year ago

Thanks, now that I have the transcripts working in the web client using application/x-subrip, I've seen that in the Android App transcripts aren't working anyway, not in KDE Express Podcast neither in Podcasting 2.0 that uses only application/srt

Should I open a separate issue for that in its repo?

mitchdowney commented 1 year ago

@Marzal could you create another issue in podverse-rn repo? And include a link to this issue in the description? If it's working in one but not the other that is odd...but also might point to a simple fix.

mitchdowney commented 1 year ago

@Marzal unrelated to this ticket, but I see 2 instances of KDE Express in Podcast Index (and Podverse, because we sync with PI). Is that intended? If not, please email info@podcastindex.org and let them know which one should be removed.

https://podcastindex.org/search?q=kde%20express&type=all

mitchdowney commented 1 year ago

@Marzal is this issue still reproducible in web or mobile after the mobile v4.13.6 update? If no, please close this ticket. Thanks

image

Marzal commented 1 year ago

@Marzal unrelated to this ticket, but I see 2 instances of KDE Express in Podcast Index (and Podverse, because we sync with PI). Is that intended? If not, please email info@podcastindex.org and let them know which one should be removed.

https://podcastindex.org/search?q=kde%20express&type=all

Dave is aware of the duplicate, we are in works to fix it. Right now is a consequence of the framework generator that KDE Express uses. For context: https://github.com/skymethod/op3/discussions/16#discussioncomment-6712555

About the VTT, I will try to find another podcast that only uses VTT, or does Podverse refresh the cached feed every X hours/days?

Right now all episodes have both VTT and SRT so it's not a valid feed to test. And deleting the srt in one episode I'm not sure if it will works, as PI doesn't re-parse the old episodes (right know PI don't even show transcript in the most old ones).

I did ask Dave to refresh the feeds a few days and it worked, and I will have to asking again when I'm done with the changes I'm doing to the feed generator.

I'm also studding how podping works in case this will force a total refresh or the feed in PI and will make not necessary the manual intervention of Dave.

And about transcripts: https://podverse.fm/es/membership shows as I understand imagen that this feature is only for the mobile apps (that's what I think the * means). But that is not true anymore, right?

mitchdowney commented 1 year ago

@Marzal we don't have any special caching for transcripts as far as I know...if the url path to a transcript updates in an RSS feed, we just need to make sure the feed gets re-parsed so our client apps use the new transcript path.

Ok if you find a VTT we can test with please let me know.

And good catch on the membership asterisk! That is indeed out-of-date. I just deployed a fix to the website.

Marzal commented 1 year ago

@mitchdowney as there is no cache I've just changed the feed in order to have episode 16 (the one that I originally linked ) with only vtt subs and the description to be able to check that is re-parsed.

How is the policy to re-parse the feeds? when a new episode / lastBuildDate is detected, the whole feed is reload?

mitchdowney commented 1 year ago

@Marzal this is ugly, ugly code (our parser is probably the ugliest of all 🤦‍♂️) but here is the logic that controls whether if, when parsing a feed, a "change" is detected, and if a change is detected, then the parser will continue to parse the rest of the feed and update in our database. Our problem with updates is either 1) Podcast Index is not notifying us when a feed changes BUT does not have a new episode, or 2) this logic I am linking to has a flaw that results in it only working when a new episode is detected.

https://github.com/podverse/podverse-api/blob/develop/src/services/parser.ts#L320-L331

As for transcripts...I just ran our api locally so I could view error logs when I try to request a transcript directly.

Our API actually has a "priority" system that will use whichever transcript is available that is the "most reliable" or common according to what we've seen. In this case, even though episode 19 of KDE Express has a vtt, the selected transcript file by priority was the following:

priorityTranscript {
  url: 'https://op3.dev/e,pg=a9a56b87-575a-5f6f-9636-cdf7b73e6230/archive.org/download/19-kde-express-parati/19-KDE_Express-Parati.asr.srt',
  type: 'application/srt',
  language: 'es',
  rel: 'captions'
}

When our API made a request to that API, it received the following error response:

502 - "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\n\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"/>\narchive.org - HTTP Error 502: Internal server error\n\n\n<body style=\"margin:0;padding:0;border:0;\">\n\n <div style=\"background-color:black;color:white;text-align:center;\">\n <img src=\"/about/logo.png\" alt=\"Internet Archive logo\" style=\"width:100px;height:100px;float:right;\"/>\n <br clear=\"right\"/>\n\n <h1 style=\"font-size:36px;margin-top:0;\">Sorry, we're kinda busy\n <a href=\"https://archive.org/details/MakingBo1947&start=348\">\n <img src=\"/about/busy.gif\" style=\"width:100%;height:auto;max-width:640px;\"/>\n \n

\n\n <div style=\"font-family:Helvetica,Arial,sans-serif;font-size:1em;color:#333333;max-width:800px;margin:20px auto 0 auto;text-align: center;\">\n

Server error 502 -- probably because our servers are overloaded right now.

\n

Please retry either now or later (by hitting refresh/reload).

\n
\n \n\n\n"

And when I tried to open that URL in a browser, it displayed:

image

Is archive.org somehow involved in your transcript hosting? If yes, I would not expect archive.org to be a reliable host for this kind of content, as they intend for their files to be accessed relatively infrequently, and my guess is they might auto-block IPs as "spam" after just a few tries.

The "transcript proxy" logic from our API (we need to use this proxy endpoint to avoid CORS errors in web browsers) can be found here:

https://github.com/podverse/podverse-api/blob/develop/src/routes/episode.ts#L127

It's very difficult for me to say at this time if this issue has anything to do with vtt vs srt file types, if there is not a reliable server hosting the transcript files involved.

Marzal commented 1 year ago

@Marzal this is ugly, ugly code (our parser is probably the ugliest of all 🤦‍♂️) but here is the logic that controls whether if, when parsing a feed, a "change" is detected, and if a change is detected, then the parser will continue to parse the rest of the feed and update in our database. Our problem with updates is either 1) Podcast Index is not notifying us when a feed changes BUT does not have a new episode, or 2) this logic I am linking to has a flaw that results in it only working when a new episode is detected.

https://github.com/podverse/podverse-api/blob/develop/src/services/parser.ts#L320-L331

I tried to follow where is getting mostRecentUpdateDateFromFeed but I'm no developer so I got lost, I see that meta.lastBuildDate || meta.pubDate are checked as fallback and lastBuildDate would be what I think RSS specs has to advertise that the content of the feed has changed even if no new episode is added (in my feed/generator that what changes). There is any way from de Android app to force a refresh/re-parse of the feed (as you would do with a RSS normal feed)?

As for transcripts...I just ran our api locally so I could view error logs when I try to request a transcript directly.

Our API actually has a "priority" system that will use whichever transcript is available that is the "most reliable" or common according to what we've seen. In this case, even though episode 19 of KDE Express has a vtt, the selected transcript file by priority was the following:

priorityTranscript {
  url: 'https://op3.dev/e,pg=a9a56b87-575a-5f6f-9636-cdf7b73e6230/archive.org/download/19-kde-express-parati/19-KDE_Express-Parati.asr.srt',
  type: 'application/srt',
  language: 'es',
  rel: 'captions'
}

When our API made a request to that API, it received the following error response:

And when I tried to open that URL in a browser, it displayed:

Is archive.org somehow involved in your transcript hosting? If yes, I would not expect archive.org to be a reliable host for this kind of content, as they intend for their files to be accessed relatively infrequently, and my guess is they might auto-block IPs as "spam" after just a few tries.

The "transcript proxy" logic from our API (we need to use this proxy endpoint to avoid CORS errors in web browsers) can be found here:

https://github.com/podverse/podverse-api/blob/develop/src/routes/episode.ts#L127

It's very difficult for me to say at this time if this issue has anything to do with vtt vs srt file types, if there is not a reliable server hosting the transcript files involved.

Both audio and subs are hosted in archive.org (the feed and web are in Gitlab Pages), we have done lots of tests/downlaods with a bigger Podcast (Podcast Linux for instance) and never have any problem/502 (it slow that for sure, but enough to have streaming audio) maybe is because of the CDN that archive uses based in geolocation. As the subs are very small I can try to leave them in Gitlab Pages.

I did see the priorities in https://github.com/podverse/podverse-api/blob/e3e6e4074f384e28ab60ef1fda7a61653ad7e10c/src/routes/episode.ts#L98 as far as I understand it , json is the first choice. If there is no feature that Podverse is able to use from that format (or any other), I would use: srt -> vtt -> json -> html. SRT is the most used and the smallest (just a few bytes smaller than vtt but a considerable difference with json and html). Of course if there are features that can be used with better formats I would prioritize the formats with more features usable.

And last ,for the VTT support, I did change the feed to only offer VTT in episode 16, if you can re-parse the feed manually we can see if that episode uses VTT of the transcript option disappear from the web or the F-Droid app.

Thanks for all the explanations about the app

Marzal commented 1 year ago

Another thing I've found is that for episodes 09 and 10 that do advertise JSON, VTT and SRT transcripts, neither the web or the app shows the transcripts. Not sure if the JSON is not what the Podverse expects

https://podverse.fm/episode/hazOEWk-m2 https://podverse.fm/episode/zSSaShKN4G

https://archive.org/download/kde-express-akademy-es https://archive.org/download/10-kde-express-novedades-fin-de-2021

EDIT: offtopic, will open another issue in the future

mitchdowney commented 1 year ago

Ok...well I'm sorry, I'm pretty stumped on how to proceed at this point. I'm also not exactly sure what we're supposed to solve for this ticket? Is the issue possibly broader than vtt? Do vtt files that other people generate and host also have this issue?

I may have to defer to someone else to work on this as I don't honestly understand the regex and parsing logic for the transcript files. In case someone can explore this more soon, here are I think the most relevant files:

https://github.com/podverse/podverse-shared/blob/master/src/transcript.ts https://github.com/podverse/podverse-api/blob/develop/src/routes/episode.ts#L153 https://github.com/podverse/podverse-web/blob/develop/src/services/transcript.tsx

Marzal commented 1 year ago

Sorry if I went a bit off topic, to summarize the original question: Is vtt subtitle format supported? Yes

Now, in order to test if podverse-web actually works well with VTT I see 3 options:

Marzal commented 1 year ago

OK, I see that now than the feed has being re-parsed, Podverse (both web and F-Droid beta) do detect that there is an transcript but don't show any text:

https://podverse.fm/episode/yuY4qZ4Cc imagen <podcast:transcript language="es" rel="captions" url="https://op3.dev/e,pg=a9a56b87-575a-5f6f-9636-cdf7b73e6230/archive.org/download/16-kde-express-kde-en-telegram/16-KDE_Express-KDE_en_Telegram.vtt" type="text/vtt"/>

Tested the other episodes and they work if the have the SRT format.

mitchdowney commented 1 year ago

@Marzal ok cool, this test case helps a lot.

When I test this transcript using podverse-api and podverse-web locally, I see that our api successfully returns the vtt transcript version, but our web app throws this error:

TypeError: Data is not valid SRT format

It appears this error may be surfacing from the transcriptator library which @stevencrader maintains. Steven I just tested this with the latest 1.1.2 version of transcriptator. Any chance you could verify sometime if this is a bug with transcriptator? It seems like it may not be successfully parsing this vtt transcript?

Here is the full response our api sends to our web app:

{
    "url": "https://op3.dev/e,pg=a9a56b87-575a-5f6f-9636-cdf7b73e6230/archive.org/download/16-kde-express-kde-en-telegram/16-KDE_Express-KDE_en_Telegram.vtt",
    "type": "unknown",
    "rel": "captions",
    "data": "WEBVTT\n\n00:00:00.000 --> 00:00:11.840\n Buenas, bienvenidas de vuelta a KDE Express. Esta vez para no perder el ritmo volvemos a la\n\n00:00:11.840 --> 00:00:16.800\n versión movilidad que no tenemos a los compañeros disponibles y hoy quería haceros un especial\n\n00:00:16.800 --> 00:00:29.440\n Telegram. Me diréis, pero esto no va de KDE y Plasma? Sí, pero alrededor de Telegram hay un\n\n00:00:29.440 --> 00:00:35.360\n montón de recursos, que a lo mejor hay algunos que no conocéis, entonces voy a hacer un resumen de\n\n00:00:35.360 --> 00:00:42.640\n todos los canales y grupos que tenéis en Telegram para poder relacionaros y recibir información o\n\n00:00:42.640 --> 00:00:53.840\n hablar sobre KDE. El primero podríamos decir que es el canal oficial de este podcast, donde colgamos\n\n00:00:53.840 --> 00:00:58.960\n cada vez que hay un episodio una reseña con información, las notas del programa, el enlace\n\n00:00:58.960 --> 00:01:03.960\n oficial a la web y los audios por si os gusta escuchar los podcasts directamente dentro de\n\n00:01:03.960 --> 00:01:10.480\n Telegram. A nosotros nos hace especial ilusión si usáis el feed público, el feed libre que generamos\n\n00:01:10.480 --> 00:01:17.760\n nosotros con software libre en la página, pero lo importante es difundir la palabra KDE así que por\n\n00:01:17.760 --> 00:01:22.480\n donde mejor os venga. Si usáis mucho Telegram pues ahí tenéis los audios también disponibles en\n\n00:01:22.480 --> 00:01:30.040\n OGG Colgador. Ese canal es KDE Express, como estoy en el coche no os fíais mucho del nombre exacto\n\n00:01:30.040 --> 00:01:34.320\n del canal, cómo se escribe, pero luego las notas del programa estará en el enlace directo para\n\n00:01:34.320 --> 00:01:39.560\n poder entrar. De hecho ahora mismo hay una encuesta en ese canal, para los que estén dentro saber si\n\n00:01:39.560 --> 00:01:45.280\n están para informarse cuando hay un nuevo episodio, si están simplemente para ayudar a que tenga más\n\n00:01:45.280 --> 00:01:49.880\n números y salga más alto en las búsquedas en Telegram o si realmente están en el canal porque\n\n00:01:49.880 --> 00:01:53.960\n escuchan ahí los audios. Por ahora como en casi todas las encuestas que yo he visto hay poca\n\n00:01:53.960 --> 00:01:59.800\n participación, creo que son 77 participantes en el canal y a lo mejor hay 20 respuestas y hay un\n\n00:01:59.800 --> 00:02:03.520\n buen número que sí que lo escuchan dentro de Telegram. Pues ese sería el primero y el más\n\n00:02:03.520 --> 00:02:08.800\n sencillo porque ahí estarán los enlaces a todos los demás, aparte de la nota del programa, pero\n\n00:02:08.800 --> 00:02:15.400\n es más sencillo pinchar ahí y que os lleva al resto de canales. El siguiente sería el KDE Planet,\n\n00:02:15.400 --> 00:02:21.280\n el internacional, que es donde aparecen todas las noticias del agregador de noticias que es una web\n\n00:02:21.280 --> 00:02:28.240\n que se llama Planet KDE, donde la mayoría de desarrolladores de KDE tienen un blog y algunas\n\n00:02:28.240 --> 00:02:33.320\n páginas oficiales también de KDE, ahí ponen su RSS y conforme van saliendo noticias aparecen en esa\n\n00:02:33.320 --> 00:02:38.400\n web y este canal de Telegram lo que hace es pues coger esas noticias y os la ponen en el canal para\n\n00:02:38.400 --> 00:02:42.760\n que no tengáis que estar atentos a un lector de RSS o entrar en la web. Eso es en inglés,\n\n00:02:42.760 --> 00:02:50.120\n que es KDE Planet, creo que ese es el nombre exacto pero bueno estará en el enlace. Y nosotros de KDE\n\n00:02:50.120 --> 00:02:56.520\n España hicimos una réplica en español, ese agregador de noticias tiene diferentes idiomas,\n\n00:02:56.520 --> 00:03:00.560\n entonces para que también tengáis todas las noticias en español que aparecen en esa web\n\n00:03:00.560 --> 00:03:06.880\n está KDE Planet Español, que creo que es KDE Planet y un bajo es. Ahí están sobre todo,\n\n00:03:06.880 --> 00:03:13.720\n sobre todo todas las noticias de KDE Blog y luego también tenéis pues si visto el hck o si yo yo\n\n00:03:13.720 --> 00:03:19.440\n etiquetan en sus blogs que la noticia es de KDE pues también sale. Hay unos cuantos, la lista es\n\n00:03:19.440 --> 00:03:27.220\n pública de ver a quién agrega ese RSS y ese canal pero el 90% de las noticias son esas. Luego aparte\n\n00:03:27.220 --> 00:03:33.840\n de esos dos canales tenemos el grupo por excelencia que es KDE Cañas y Bravas, que es un grupo de\n\n00:03:33.840 --> 00:03:40.240\n debate y de información, el cual hace poco lo hemos convertido en un grupo con la funcionalidad\n\n00:03:40.240 --> 00:03:47.360\n esta que tiene Telegram de hacer temas o topics, porque antes cuando alguien hacía una pregunta\n\n00:03:47.360 --> 00:03:52.520\n de una duda en concreto de que algo le fallaba o cuando había un debate era un poco locura seguir\n\n00:03:52.520 --> 00:03:57.440\n la conversación si había dos personas hablando de cosas diferentes o simplemente no te interesaba\n\n00:03:57.440 --> 00:04:02.320\n saber qué problema tenía esa persona. Ahora con los temas pues tenemos temas de noticias oficiales\n\n00:04:02.320 --> 00:04:08.400\n de KDE España, tenemos temas para el Akademy, una sección de ayuda por si a alguien le falla algo,\n\n00:04:08.400 --> 00:04:14.360\n tenemos un off topic para poder hablar de lo que sea, tenemos para hablar de software libre que\n\n00:04:14.360 --> 00:04:19.320\n no esté directamente relacionado con KDE, diferentes temas para tener la conversación\n\n00:04:19.320 --> 00:04:24.800\n ordenada. Lo llevamos probando un tiempo yo diría que va bastante bien, hicimos una encuesta para\n\n00:04:24.800 --> 00:04:30.200\n activarlo, salió por mayoría que sí, había alguna gente que no le gustaba y dieron algunos motivos\n\n00:04:30.200 --> 00:04:34.480\n que si es verdad que tiene alguna cosa peor pero tiene también bastantes ventajas que lo que veo\n\n00:04:34.480 --> 00:04:39.840\n la mayoría y luego tiene una función que es que tú puedes desactivarlo, si a ti no te gusta tú puedes\n\n00:04:39.840 --> 00:04:46.160\n darle a los tres iconitos que hay arriba a la derecha, ver como mensaje y desaparece la división\n\n00:04:46.160 --> 00:04:52.640\n de los temas y lo ves todo como un solo chat, verás que todos los mensajes se encadenan como\n\n00:04:52.640 --> 00:04:57.960\n respuesta a otro que es lo que identifica a qué tema estás respondiendo y tenemos otra encuesta\n\n00:04:57.960 --> 00:05:02.720\n abierta porque Telegram hace poco ha puesto la posibilidad de ocultar los miembros en los grupos\n\n00:05:02.720 --> 00:05:08.640\n de a partir de 100 participantes para evitar el problema habitual de que entra un bot y aunque\n\n00:05:08.640 --> 00:05:14.840\n no pase el captcha como puede entrar inicialmente y lo que hace el anti bot, el bot que los expulsa,\n\n00:05:14.840 --> 00:05:19.920\n es expulsarlos y no resuelve la ecuación o la pregunta que le haga, el problema es que en todo\n\n00:05:19.920 --> 00:05:25.080\n ese tiempo el bot ya ha entrado y ha podido chuparse todos los ids de todos los usuarios que hay dentro\n\n00:05:25.080 --> 00:05:29.360\n y entonces se los guarda, los va metiendo en base de datos y de vez en cuando pues te llegará un\n\n00:05:29.360 --> 00:05:35.000\n mensaje en inglés o en español no demasiado bien escrito ofreciéndote chorradas, criptomonedas,\n\n00:05:35.000 --> 00:05:40.440\n pornografía o mierda, entonces hemos propuesto ocultar los miembros, eso tiene una serie de\n\n00:05:40.440 --> 00:05:44.720\n inconvenientes que algunos participantes del grupo han puesto, como que ya no tienes tan fácil\n\n00:05:44.720 --> 00:05:49.440\n mencionar a alguien que dentro del grupo a no ser que evidentemente pues haya escrito hace poco y\n\n00:05:49.440 --> 00:05:54.160\n entonces o le contestas o ves cuál es su alias y le pueden mencionar, pero ya no pueden mencionar\n\n00:05:54.160 --> 00:05:59.920\n a cualquiera porque no puedes ver a las 200 o 500 personas que hay, ahora mismo la encuesta creo\n\n00:05:59.920 --> 00:06:04.920\n que va ganando el sí en ocultarlo, yo estoy en algunos grupos que ya lo han hecho pero está ya\n\n00:06:04.920 --> 00:06:12.240\n tiempo de votar si os interesa, yo la verdad es que me encantan las estadísticas pero veo una\n\n00:06:12.240 --> 00:06:17.680\n participación muy pequeña en todas las encuestas de Telegram que veo, yo creo que es porque la gente\n\n00:06:17.680 --> 00:06:22.200\n se apunta a muchos canales y a muchos grupos que luego no mira en siglas, entonces aunque haya 500\n\n00:06:22.200 --> 00:06:27.840\n personas realmente que lo abran y que se metan a lo mejor no hay ni un 20%, hay 100 y de esa\n\n00:06:27.840 --> 00:06:32.480\n acción pues algunas contestan la encuesta, en cualquier caso pues bueno la encuesta representa\n\n00:06:32.480 --> 00:06:38.880\n a la gente que está activamente ahí metiéndose y que está interesada en el tema, hemos dicho el\n\n00:06:38.880 --> 00:06:48.040\n KDE Express, Planet y Planet en español y el grupo de Cañas y Bravas, luego también hay varios grupos\n\n00:06:48.040 --> 00:06:54.720\n de, estos ya son todos internacional, pues hay un grupo de Kdenlive, hay un grupo de Krita,\n\n00:06:54.720 --> 00:07:01.760\n hay grupos específicos de administradores de sistemas de KDE, de promoción, eso también os\n\n00:07:01.760 --> 00:07:09.440\n dejaré un enlace a la página oficial de KDE internacional que tiene ahí un listado de los\n\n00:07:09.440 --> 00:07:15.240\n grupos, todos esos los internacionales, la mayoría tienen un puente de Matrix, Telegram es libre\n\n00:07:15.240 --> 00:07:19.960\n cliente pero ya sabemos que el servidor no, entonces siempre se intenta preferir usar el\n\n00:07:19.960 --> 00:07:25.240\n protocolo Matrix que sí que es libre aunque sea un poco centralizado pero bueno en el fondo es\n\n00:07:25.240 --> 00:07:31.520\n mucho más libre y técnicamente es libre, lo único es que la implementación, el Matrix.org es el que\n\n00:07:31.520 --> 00:07:36.400\n se lleva la mayoría y hay gente que se queja de que XMPP es más descentralizado pero bueno\n\n00:07:36.400 --> 00:07:42.760\n técnicamente es 100% libre y entonces en KDE internacional todas las salas oficiales suelen\n\n00:07:42.760 --> 00:07:46.840\n ser en Matrix pero por facilidad y porque es verdad que ahora mismo Telegram tiene mucha\n\n00:07:46.840 --> 00:07:51.840\n más implantación pues tienen puente y estés donde estés puede ver lo que se escriba cualquiera.\n\n00:07:51.840 --> 00:07:56.920\n En estos que os he dicho anteriormente los canales no tienen sentido y en los grupos\n\n00:07:56.920 --> 00:08:02.760\n todavía no hemos tenido a ningún voluntario que quiera ocuparse de hacerlo, nosotros internamente\n\n00:08:02.760 --> 00:08:08.200\n con la junta de KDE España sí funcionamos con grupos de Matrix. Luego también hay otro grupo\n\n00:08:08.200 --> 00:08:13.560\n que sí que es en español que es el del Akademy que es para los voluntarios de gente que quiera\n\n00:08:13.560 --> 00:08:20.600\n ayudar a hacer la Akademy y luego solemos hacer otros con otros componentes. Akademy 2023 se acaba\n\n00:08:20.600 --> 00:08:29.000\n de abrir el call for papers, el envío de charlas con lo cual podéis enviar charlas o proponeros\n\n00:08:29.000 --> 00:08:35.760\n cómo ayudar a participar ya sea online o directamente allí en Málaga. Y con esto creo que sí que ya he\n\n00:08:35.760 --> 00:08:41.280\n hecho el repaso completo, espero dentro de algunos meses tener suficientes recursos en Matrix para\n\n00:08:41.280 --> 00:08:46.000\n poder hacer otro episodio especial pero mientras tanto hay que intentar difundir la palabra de\n\n00:08:46.000 --> 00:08:50.840\n software libre de KDE por todos los sitios en donde se pueda. A mí no me veréis difundiendo\n\n00:08:50.840 --> 00:08:55.600\n en facebook pero soy de los que piensa que si hay alguien con el estómago de hacerlo debería hacerlo\n\n00:08:55.600 --> 00:09:01.680\n y seguro que suma. Y con esto pues nada os dejo un episodio corto para acortar los tiempos hasta\n\n00:09:01.680 --> 00:09:06.880\n que tengamos tiempo de volver a grabar los compañeros. Un saludo y nos vemos por las redes. Hasta luego.\n\n00:09:06.880 --> 00:09:10.220\n [MÚSICA]\n\n00:09:10.220 --> 00:09:13.220\n no\n\n"
}
stevencrader commented 1 year ago

Thanks for the ping. I'll take a look.

stevencrader commented 1 year ago

I published a new version of transcriptator (1.1.3) that parses this VTT file.

The reason it is saying it isn't valid VTT is because the SRT parser is used for parsing VTT files because the formats are very similar and I re-use the parsing code.

This VTT file didn't have and index line so it wasn't being parsed right. The only VTT example I had before this had an index line so I made the assumption that they all do.

mitchdowney commented 1 year ago

I just upgraded transcriptator and .vtt parsing worked 🥳 the change is deployed now in prod. Thank you both for your help with this!

image
Marzal commented 1 year ago

Great, thank you both. I assume that Podverse Android/F-Droid would need a new beta to test it, right?

I did some changes to a few episodes so I can test differents variants, in case you need test cases:

16
Op3+ archive.org + vtt only

15
Op3+ gitlab + vtt only

09
Op3 + gitlab + vtt
Op3 + archive + srt

08 (without OP3)
Gitlab + vtt
Archive + srt

00
Op3 + gitlab + vtt/srt

All others
OP3 + archive + vtt/srt
mitchdowney commented 1 year ago

@Marzal I actually just submitted a new version 4.13.8 of the mobile app to the stores...and I'm not 100% sure if the new versions will have pulled in the newer version of transcriptator, or I built them right before the new version of transcriptator was published. So there's a chance this is fixed in 4.13.8, but if it's not, then it will definitely be fixed in 4.13.9.

Also, since this is a feature for Podverse web, and web seems to be fixed, I'm going to go ahead and close this issue. I think the different test case scenarios you're referring to might make more sense posted to the transcriptator repo https://github.com/stevencrader/transcriptator