themoeway / jmdict-yomitan

JMdict, JMnedict, KANJIDIC for Yomitan/Yomichan.
MIT License
76 stars 3 forks source link

Plain text version? #5

Closed jacobalbano closed 5 months ago

jacobalbano commented 5 months ago

Even the legacy distribution appears to contain extra HTML styling that I don't want. The version of jmdict on the foosoft page is long out of date at this point, and it bums me out that there doesn't seem to be any kind of option to get it in the old format.

Just to head any objections off at the pass: I know it's possible to override the CSS in anki templates. The problem is that I do pre-processing on my fields (with javascript in the card layout) and I need it to be in plain text before that can happen. Accounting for all the inline styling is brittle and feels like it should be unnecessary.

MarvNC commented 5 months ago

This would be an issue for the Yomitan Import project which I doubt anyone is working on anymore. I think it'd be a good time to consider changing your Anki workflow, you'll find that many other dictionaries have some level of styling applied that enhance the experience.

stephenmk commented 5 months ago

Check out commit 9222417 from yomitan-import (the last commit before I started mucking around with it) and use it to build jmdict for yomichan.

jacobalbano commented 5 months ago

@MarvNC: This would be an issue for the Yomitan Import project

My expectation was that the styling was part of the project configuration here and it could be generated along with the other releases. It seems that's not the case?

@MarvNC: I think it'd be a good time to consider changing your Anki workflow, you'll find that many other dictionaries have some level of styling applied that enhance the experience.

With all due respect this is exactly the kind of comment I wanted to avoid. You may prefer a "batteries included" approach, but I do not. I want my anki fields to contain plain text that can be shaped with the handlebar templates into a form that my postprocessing can handle without doing all kinds of sanitization or regex matching, both of which are bad for stability.

@stephenmk: Check out commit 9222417 from yomitan-import

Thanks for the lead Stephen. Am I to understand that the only way for the importer to make true plain-text dictionaries now is to use an old revision? That seems like a crucial loss of functionality to me.

MarvNC commented 5 months ago

We only have the information that is missing from the old versions (notes, source languages, etc) due to the hard work of developers like Stephen who have sunk many hours into improving the conversion and display of information. Yomitan import does not have an option for plain text and as Stephen pointed out, you'd need to downgrade to a very old version of the importer for a truly text-only version.

As I mentioned earlier, the project is no longer actively developed as we have largely moved on and have more recent and better converts for the dictionaries that were formerly supported by the importer. The reason I suggest you change your workflow is that a majority of dictionaries now out there contain content beyond plain text, and I think you'd find it challenging to continue maintaining that setup. Text processing using handlebars, as I'm sure you've found, is not fun to work with so accessing the textContent and processing within Anki might be your best bet.

StefanVukovic99 commented 5 months ago

There's a script to make "plain" dicts, though I'm not sure how well it works or if that's what you want.

stephenmk commented 5 months ago

Thanks for the lead Stephen. Am I to understand that the only way for the importer to make true plain-text dictionaries now is to use an old revision? That seems like a crucial loss of functionality to me.

The only entries with HTML styling in the "legacy" version of the dictionary are the search-only terms. I wouldn't expect that anyone would want to make flashcards of these terms anyway, because they only contain hyperlinks to different entries. Search-only terms are also generally reserved to rare and unusual spellings of words.

Marv alluded to this, but the reason why nobody wants to support the "legacy" format is that it's missing a lot of information. This information was never considered "optional" by the JMdict editors; Yomichan just failed to implement it.

The reason why the "legacy" version of the dictionary was able to get away with using plain-text data was that the Yomichan web extension itself took responsibility for the presentation layer. In other words, many design decisions were made in Yomichan by assuming that most users would just be using this plain-text version of JMdict. In retrospect this was a bad idea that continues to haunt us today. There are now many different dictionaries available for Yomi* extensions, and their design needs are almost never the same as the plain-text JMdict dictionary. It's better for dictionaries to be able to describe their own presentation (via HTML styling).

jacobalbano commented 5 months ago

We only have the information that is missing from the old versions (notes, source languages, etc) due to the hard work of developers like Stephen who have sunk many hours into improving the conversion and display of information.

I understand. It's not my intention to downplay the work that's gone into this. Personally, this additional information isn't desirable to me. 十人十色.

Text processing using handlebars, as I'm sure you've found, is not fun to work with

No arguments there. When I first installed yomichan it took me a while to get my templates just right. The one benefit of that approach was that I had full control over the data that came out; for example, I could turn multi-gloss entries into a delimited list which was trivial to parse and process after the fact. With the current dictionaries, glosses come pre-packaged in <ul> elements that I can't do anything about and suddenly my whole setup needs to be reconfigured (while maintaining compatibility with the cards I've already got... 🥲 )

It's better for dictionaries to be able to describe their own presentation (via HTML styling).

That makes sense. I can definitely see why dictionary authors would want them to show up in a particular way in yomi* projects, and why those projects would want to offload the presentation to the dictionaries. Just sucks for me in particular I guess.

There's a script to make "plain" dicts, though I'm not sure how well it works or if that's what you want.

@StefanVukovic99 thanks, I'll check it out.

-- Since it's clear at this point that my request isn't something that can be implemented as another step in the workflow, I'll go ahead and close the issue. Thanks everyone for your responses.

stephenmk commented 5 months ago

Just sucks for me in particular I guess.

There are others who have expressed similar frustrations with HTML dictionaries because the markup has interfered with their custom scripts and templates. The hope is that the HTML dictionaries will provide a better default experience for most users without the need for them to spend so much time crafting their own flashcards. The less time people have to spend playing around with flashcard setups, the more time they can spend learning Japanese.

For the comparatively small number of users who want to write advanced flashcard customizations, they are free to acquire their own materials. Like I said, all you need to do is checkout an old branch of yomichan-import, build it, and use it to process the JMdict file. This will produce the old style dictionary with the latest JMdict data and without any HTML. (Again, with the exception of the search-only terms, this "old style" dictionary will be identical to the "legacy" dictionary published daily by this repo.)